bcampbell / journalisted

11 stars 7 forks source link

Map of news IDs to names #2

Open symroe opened 13 years ago

symroe commented 13 years ago

(Let me know if this isn't the correct place to report these things)

The 'findArticles' API response contains a 'srcorg' ID, however I can't find a mapping of these to the source name. Also, it seems there are more than one ID per source, for example IDs 4 and 11 both contain links to guardian.co.uk. I expect one is The Observer and the other is The Guardian, but it's not clear without manual checking.

Solutions:

I can't find URLs on the site containing these IDs either, so I guess they are just used by the scrapers or something behind the scenes.

I'd be happy to write a patch, but I'd need to get it working locally. If that's easy then I'll have a go...

:)

bcampbell commented 13 years ago

Hi Sym - sorry about the slow reply but I've been wading through churnalism stuff recently...

Anyway: srcorg refers to the organisation (really the publication) the article came from. There used to be only 20 or so in the system - the ones we scrape - but the list has exploded recently as I've started creating publications for the articles that users have submitted manually. I think we're up to about 700 so far. You can see a noddy list of them at: http://journalisted.com/publication (the ids in the urls are the srcorg ids)

Here's a list of the first 20 (the UK nationals, which we scrape):

 id |    shortname     |      prettyname      |               home_url               
----+------------------+----------------------+--------------------------------------
  1 | independent      | The Independent      | http://www.independent.co.uk
  2 | dailymail        | MailOnline           | http://www.dailymail.co.uk
  3 | express          | The Daily Express    | http://www.express.co.uk
  4 | guardian         | The Guardian         | http://www.guardian.co.uk
  5 | mirror           | The Mirror           | http://www.mirror.co.uk
  6 | sun              | The Sun              | http://www.thesun.co.uk
  7 | telegraph        | The Daily Telegraph  | http://www.telegraph.co.uk
  8 | times            | The Times            | http://www.timesonline.co.uk
  9 | sundaytimes      | The Sunday Times     | http://www.timesonline.co.uk
 10 | bbcnews          | BBC News             | http://news.bbc.co.uk
 11 | observer         | The Observer         | http://observer.guardian.co.uk
 12 | sundaymirror     | The Sunday Mirror    | http://www.mirror.co.uk
 13 | sundaytelegraph  | The Sunday Telegraph | http://www.telegraph.co.uk
 14 | skynews          | Sky News             | http://www.sky.com
 15 | scotsman         | The Scotsman         | http://www.scotsman.com
 16 | scotlandonsunday | Scotland on Sunday   | http://scotlandonsunday.scotsman.com
 18 | ft               | Financial Times      | http://www.ft.com
 19 | herald           | The Herald           | http://theherald.co.uk
 20 | notw             | News of the World    | http://www.newsoftheworld.co.uk/

I'm happy to add an API to get at this stuff, although the actual data layout is still in a bit of flux, so it's likely to be a little unstable (although the basic ids/names should be pretty stable - I'll just be adding additional fields as I go).

And it's not too hard to get the site running locally - just a matter of me writing up some notes on doing it, which would be a good thing anyway! I see you're signed up for the Data and News Sourceing workshop in London tomorrow - want to meet up there and chat about some of this stuff?

symroe commented 13 years ago

Excellent thanks. Don't worry about slow replies, this is all for a really silly personal project anyway!

I've more or less re-created that table manually, by looking through all the links I've got. An API call would be excellent at some point in the future though.

Yep, meeting up tomorrow would be excellent!