Figure out feature overlap with Dexter

pudo commented 10 years ago

This seems to do mass retrieval of news articles, then entity extraction and storage.

Grano will focus on sets of entities (some of which may not be in Calais), but some users (Poderopedia, SF) may want to find mentions of these entities in a set of publications and then extract structured knowledge from specific articles. Could Dexter be helpful in this scenario, @longhotsummer?

longhotsummer commented 10 years ago

Dexter's value add is not really the entity extraction (which is done by 3rd party APIs) or article crawling, it's more what it allows human users to enter about an article. It's currently focused on classifying who speaks in the media and what issues are raised, with a bent towards SA's elections.

We discovered that the machine learning data was too dirty to rely on alone, so human monitors use it as a starting point and add missing voices or remove erroneous ones.

Our data is biased heavily towards politicians and their political affiliations (ie. not business affiliations).

There is definitely room to build on Dexter to capture other details about people (eg. non-political affiliations) but the human factor is pretty key.

pudo commented 10 years ago

That makes absolute sense - when I said "extract structured knowledge", I didn't necessarily mean for that to imply machine learning or things like that. In fact, one of our potential partners works with a news team who would most likely want their reporters to tag articles right out of the CMS to link them into an entity database.

longhotsummer commented 10 years ago

From working on dexter I've realised there's interesting data beyond just entity extraction. For instance (these have a media monitoring bias, but still apply):

who's actually being quoted in the press vs just mentioned
what affiliations do they have when they are quoted? Are they experts, eye witnesses, representatives etc?
children are often unnamed when quoted and entity extraction misses them, but there's still data there (age, gender, race, affiliation)
3rd party entity extraction is often western biased and so misses a lot of eg. African names and places (or gets them very confused)

It all depends heavily on your use case (unsurprisingly :)

Greg

(iPhone)

On 05 May 2014, at 12:05 PM, Friedrich Lindenberg notifications@github.com wrote:

That makes absolute sense - when I said "extract structured knowledge", I didn't necessarily mean for that to imply machine learning or things like that. In fact, one of our potential partners works with a news team who would most likely want their reporters to tag articles right out of the CMS to link them into an entity database.

— Reply to this email directly or view it on GitHub.

pudo commented 10 years ago

Awesome, thanks for sharing these learnings! For what it's worth: have you been in touch with what folks at MIT Civic's MediaCloud are doing?

In any case, this seems clarified now, closing.

miguelpaz commented 10 years ago

Hi guys, Miguel here from Poderopedia. I just discovered Dexter. Love it. In our user case: we are data editors and journalists gathering information, checking, confirming, and adding to our platform information and connections between entities that we find in news sources and data bases, that we quote as sources of info and connections. For us, as a small team, to save as much time as posible on searching data, filtering which is relevant and not (maybe percentages of times mentioned in a story or if in headline related to which other terms or issues?), and adding to the platform, is a huge need so the team can spend less time on that and more time on cualitative reporting. Also, to be able to setup ways of doing media monitoring for the same purposes, would be a tremendous help and I see that Dexter could be of tremendous value on those things or am I wrong?

ANCIR / grano

Figure out feature overlap with Dexter #63

Greg