Data4Democracy / internal-displacement

Studying news events and internal displacement.
43 stars 27 forks source link

Generate a reliability score for a given article #65

Open simonb83 opened 7 years ago

simonb83 commented 7 years ago

In some contexts, information about IDPs is highly politicized, which could be problematic if you're drawing from media reports. You'd want to be very careful in selecting which sources you used for info about the Rohingya in Myanmar, for example.

It would be good to be able to score an article for reliability in order to help analysts as they analyze and interpret the extracted data. In some cases, news sources may be government run, 'fake news' or have poor sources / track record, and so any data reported by and extracted from these sources should be identifiable as having potential issues.

On the front end, this could include a filter for analysts to use whereby they can select all articles, or those which a reliability score above a certain threshold.

Some thoughts for implementation include:

  1. A maintainable list of known problematic sources
  2. Measuring similarity of reported facts between sources
  3. A maintainable list of highly trusted and common 'core' news sources and anything from these sources automatically gets a high reliability rating.
  4. New or unknown sources automatically get a lower rating unless their facts are similar enough to a report from a highly trusted source etc.
georgerichardson commented 7 years ago

We could also look at links in the text to see what other sources they cite.

Are you envisioning that this kind of score be hard coded or that there is also an element of learning from an analyst who verifies the sources?

simonb83 commented 7 years ago

I'm not really sure yet, but likely some sort of combination.

Probably initially some hard coded rules to generate a preliminary score that can then be verified by an analyst and updated if need be.

If the 'rules' include some sort of whitelist or blacklist for certain sources, then this could definitely be automatically updated as as analysts verify the sources.

Definitely later down the line with enough hand-reviewed articles, it would be interesting to try and apply ML and see what sort of features might help distinguish articles.