bcampbell / journalisted

11 stars 7 forks source link

Journalist from covered site not showing in results #3

Open doubi opened 12 years ago

doubi commented 12 years ago

The Telegraph's Tom Whitehead isn't showing in search results. His Telegraph profile page is here:

http://www.telegraph.co.uk/journalists/tom-whitehead/

There is however a Tom Whitehead on journalisted, listed as working for the Daily Express. I note however that his last contributions to that paper were in 2007.

At first I thought this might be a job move issue: is the code refusing to recognise his work at the Telegraph as there's an already existing Tom Whitehead entity?

But, unless there's some common FOAF data or something on both the Express and Telegraph journalist profile pages, which I doubt, that would mean that journalisted essentially can't handle two journalists having the same name, which can't possibly be the case.

Anything else I can do to help research the issue?

bcampbell commented 12 years ago

Ahh... looks like the bylines in the telegraph aren't being picked up properly :-( At least one of Tom Whitehead's articles is in there (http://journalisted.com/article/2tali), but it's not linked it to him. I'll get it fixed.

FOAF information? If only ;-) The web is a mess. Even just getting a list of articles from any given newspaper site can be a major undertaking. The current policy on matching up bylined names to journalisted profiles is: if the name matches a profile, and the article is from a publication they've written for previously, it's assumed to be the same person. If the name matches a profile, but we have nothing on file for them from that publication, it's assumed it's a different person and the system will create a new profile for them. This check is bypassed if the article was manually submitted - the system knows which profile is expected, and will attribute the article to the profile if the name matches, even if the journo has no previous articles on file for that publication.

Of course, we manually merge and split profiles when we find out there's been a mistake - which we'll likely have to do for this case once it's all working again!

bcampbell commented 12 years ago

Doh. Turns out it was a bug in the byline parser, which was rejecting his name because it thought it was a job title. I've added his latest article to his profile, so future telegraph articles should be attributed just fine. There are a whole bunch of his telegraph articles still not attached to his profile. We could manually go through and sort them out, but I think it's a more general problem - there will be other affected journos, so I'll need to write a little tool to go back and sort things out.