KaiDMML / FakeNewsNet

This is a dataset for fake news detection research
1.1k stars 429 forks source link

Speeding up the download #19

Closed SaschaStenger closed 5 years ago

SaschaStenger commented 5 years ago

I have been running the code non stop for about two weeks now and I do get the feeling, that somehow it will take an even longer time to get the dataset ready.

And when posting a question about data collection limits on the twitter dev forum, it was pointed out ,that the code is using sub optimal lookup for the tweet gathering. Forum post I wanted to bring this to attention, so that the collection process could be sped up for everyone using this dataset.

mdepak commented 5 years ago

@SaschaStenger Thanks for investing the issue and letting us know the improvements. If you have already made the change to speed up the download, you can create a pull request otherwise I will do the changes accordingly.

SaschaStenger commented 5 years ago

@mdepak I have updated the tweet collection process, so that it now calls up to 100 tweets per call. Also implemented the fix concerning the c.long on Windows. Both are implemented in the cloned repo

mdepak commented 5 years ago

@SaschaStenger Thank you for suggesting an efficient Twitter API. Fixed the issue in https://github.com/KaiDMML/FakeNewsNet/commit/3c1ae3c41b32845243db08cac4ec9a9f7c7a43b3

SaschaStenger commented 5 years ago

@mdepak Thank you for fixing the issue. If I would have known how pull requests work, i would have liked to handle it this way (i'm new to trying to contribute to projects like this). Also your changes in moving the chunking to the utils packages seams a more elegant solution.