drmikecrowe / mbfcext

Media Bias Fact Check extension
https://drmikecrowe.github.io/mbfcext/
MIT License
35 stars 8 forks source link

Missing sources in the extension that can be found on the website #37

Closed sarmadchandio closed 7 months ago

sarmadchandio commented 1 year ago

I was scrapping data from mbfc for some research work and came across your extension and the data in docs/v5/data/sources-pretty.json. This is the form of data I was interested to scrape myself, however I found that data for 1075 sources was missing from the sources-pretty.json which the website now offers. You can find the "missing-sources.json" here: https://github.com/sarmadchandio/mbfc-scrapper/tree/main

Do you have an idea of why this might be happening? Rather scraping the website every 3-4 days to get updated stuff I would definitely prefer using your data dump for my analysis. I would appreciate it if you were to look into this.

The reason I compared my collected urls with sources-pretty.json and not with combined.json (which was updated 13 hours prior to writing this comment) was that they contain the same data on sources.

Let me know if I am unclear in something.

drmikecrowe commented 7 months ago

This was completed last year. Please open a new issue is you still see issues