fandangOrg / fandango

FAke News discovery and propagation from big Data ANalysis and artificial intelliGence Operations
1 stars 1 forks source link

Fix for authors that represent groups #91

Closed pstalidis closed 3 years ago

pstalidis commented 3 years ago

The end-users have noted that there are cases where author.name might represent a group of authors, for example we might find "Newsroom" as the author name.

The major problem that arises is if there is a "Newsroom" author in multiple sources, we have to make a distinction between them.

In order to solve this, the end-users will mark a list of author names as single authors or group authors (i.e "Newsroom")

Then, the list of author names that belong to this list should be treated differently in the Author analysis module, by appending (or pre-pending) the source (domain name)

pstalidis commented 3 years ago

Breaking it down to specific actions, @neilpbyrne can you provide a list of all the unique author.names that appear in the ingested articles so that we can give it to the end-users?

dmgutierrez commented 3 years ago

So, in order to generate the unique identifier of the author, I'm using the name as well as the source domain. Thus, let's consider that we receive 2 articles:

Since the identifier is based on both the name and the source domain, two different entries will be created in Elasticsearch. One of them will be associated to OrgA and the other to OrgB.

Consequently, we are assuming of course that the authors are different since we only have a name as input.

pstalidis commented 3 years ago

Apparently different "groups" are already distinguished, therefore no action is needed.