freedomofpress / securethenews

An automated scanner and web dashboard for tracking TLS deployment across news organizations
https://securethe.news
GNU Affero General Public License v3.0
100 stars 25 forks source link

Adding WikiData query for Newspaper site urls and opportunity for new filter and display criteria/info #173

Open brierjon opened 6 years ago

brierjon commented 6 years ago

The scope of Secure The News as it states "major news sites", but doesn't express what are the criteria to be selected. First the inclusion criteria should be clarified to what resources this site is limited to run the scan as we can easily increase coverage of news sites monitored with a periodic WikiData query.

Why a WikiData query? More sites, more filter criteria, shared data for improved coverage. In a query of Wikidata (CC0 license) I wrote a query for newspapers - http://tinyurl.com/ychz3z25 As of September 14, 2018 it returns 20,512 newspapers (various formats). 5,028 of these sites have official websites associated and this could improve over time with some work on WikiData. Of this 2310 provide a format.

Query for News Channels: http://tinyurl.com/ydh9v47q - 44 results. Query for News Magazines: http://tinyurl.com/ycosdevh - 20 results. Query for Student newspapers: http://tinyurl.com/yd778rol - 20 results. Most likely the larger newspaper field includes some of the subsequent categories. Some deduplication and categorizing needs to be applied.

Other benefits from Wikidata integration are that other ways to sort and present the lists of News sites could be integrated. e.g. Showing University based Newspapers, founding dates, primary location, languages distributed, etc.

Before I dive into this, I wanted to scope what could/should be added to the site list and go from there on this import/integration.