Open jonroberts opened 10 years ago
Lots of spams. Scripts still have problem filtering out spams with short descriptions.
Giving up on datahub. I started scraping with ~6000 datasets. Two days ago, there were ~10000. Today, there are ~30000, and most of the new ones are spams. Read span_digest.json for details.
Ouch. Looks like they're in a bad state. Thanks for the update.
J
On 10 October 2013 10:14, karvenlam notifications@github.com wrote:
Giving up on datahub. I started scraping with ~6000 datasets. Two days ago, there were ~10000. Today, there are ~30000, and most of the new ones are spams. Read span_digest.json for details.
— Reply to this email directly or view it on GitHubhttps://github.com/CleanData/Datasets/issues/2#issuecomment-26056930 .
Will run datarelations after NYC opendata completes.
datahub.io has a lot of datasets on there, but many are duplicates and it's not specifically energy focused.