fandangOrg / fandango

FAke News discovery and propagation from big Data ANalysis and artificial intelliGence Operations
1 stars 1 forks source link

offline services: create/update document in fdg_article (removing aggregation step) #100

Closed mmagaldi-eng closed 3 years ago

mmagaldi-eng commented 3 years ago

As agreed in the last technical call about the offline process, to streamline ingestion as well as development activities, we will remove the aggregation steps (and relative constraints), writing directly on ES:

All analyzers will write directly in the fdg_article index, creating or updating the JSON document using the “upsert” element. More specifically, analyzers steps are:

Notes

  1. @macagari, @tavitto16, @pstalidis please refer to the “ARTICLE_INDEX” sheet in the shared documentation to check the fields that should be provided by your services (“set from” column of the sheet). See also https://github.com/fandangOrg/fandango/issues/97#issuecomment-762134519
  2. @pstalidis and @neilpbyrne will help to write/check the update scripts
  3. calculatedRating and calculatedRatingDetail (is a JSON object) fields will be added by the fusion score service
  4. @pstalidis it is assumed that the fusion score service was designed to manage concurrency issues during score calculations (multiple concurrent requests occur for the same document).
pstalidis commented 3 years ago

About note 2, an example of upsert for elastic is the following: elastic.update(index=index, id=article, body={ "doc": {"about": about, "topic": topic, "mentions": mentions}, "upsert": {"about": about, "topic": topic, "mentions": mentions, "identifier": article} })

where, after the "doc" part you put the fields for the partial update and after the "upsert" part you put what is to be indexed if the document is missing.

pstalidis commented 3 years ago

About note 1, I have updated the code for offline topics and offline media.

pstalidis commented 3 years ago

About notes 3 and 4, the offline fusion score was updated together with the online part.

The process takes into account that there can be multiple concurrent calls for the same document at the same time.

pstalidis commented 3 years ago

@macagari @tavitto16 when your services are ready, report here so that we can start an actual test

macagari commented 3 years ago

The offline process modifications are online.