Closed ebrucucen closed 3 years ago
These scripts should be launched from the root of the projects, where the 'data' folder is situated, I think that's the origin of the error message.
These scripts should be launched from the root of the projects, where the 'data' folder is situated, I think that's the origin of the error message.
Fair comment Alex, I think a consistent approach across all script would definitely be better off. Carbon_sense, cato-institute does output to their current directories.
df.to_csv("carbon_sense.csv", index=False)
Taking your suggestion into account, we need an execution script to guide us (especially newbies like me), so, I will close this one and create another issue to have a run-script, if you agree?
Sure why not. But we should also consider that these scripts were more like for a one-off scraping rather than regular. We might want to do regular rescraping (not sure about that), in that case we would need to rewrite the scripts so that they don't redownload everything again. Having set up a consistent approach would be great, if we need to do more scraping.
I see your point, so is a regular scaping one would be the next step? When we want "live" data consumption, and track the "new" news about climate misinformation?
Well that's a good point. I think that'd be good to discuss this at the next meetup. I missed the last one unfortunately so someone else might be better able to tell what's the current priorities but seeing all the activity I understand that people are labelling the data. That is in line with the idea to have a working product as soon as possible and then see how to improve that (this idea seemed to have good support during the last meetup I went to, including mine).
That being said, that wouldn't hurt to think about longer term and try to make things (such as the earlier parts of the pipeline) work smoothly. So maybe we want to start setting up our scraping scripts in a way that they would run on a regular basis. Maybe we should discuss this on the slack see what people think about it.
[x] What is the trigger: Running the breibart-defense script cause the fail error message
[x] What is the error message:
[x] What is the expected behaviour:
/data/breibart-defense.csv
file to be populated with the links and articles