Open nicholasjhorton opened 5 years ago
The csv's need a skip = 2 argument because of some junk at the top of the files
When reading the data files, there may be many warnings from a failure to parse into the assumed data types.
Any files that are of suspiciously small size (<1KB) are probably files that errored out during scraping.
Example contents
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
document these well