anHALytics / anhalytics-core

Analytic platform for the HAL research archive (in development)
Apache License 2.0
13 stars 1 forks source link

compute/index CoAuthors #36

Open Aazhar opened 8 years ago

Aazhar commented 8 years ago

Co-Authors should be extracted before indexing (handle duplicates using common distances)

Aazhar commented 8 years ago

Indexer returns data by buckets, so we might not have all the coauthors, btw same apply to keywords and interests..

kermitt2 commented 8 years ago

The impact depends a bit on the task, we can get the n top co-authors but indeed not all of them (even with the latest ElasticSearch aggregations). The question is thus do we need to give all of the co-authors or simply the n top co-authors - the latter can make more sense in an analytics task I think.