Closed UP2040499 closed 1 year ago
Is possible to use the NER from the start for this.
However will need to see if this can be accomplished in good time for ~200-300 sources.
This can be done using BeautifulSoup to parse the HTML.
Could have a threshold for the amount of mentions throughout all sources, for an entity to be added to the popular entity store.
Or use a leaderboard type system, and have a max number of entities to store. E.g. store top 30 most popular entities.
Currently can find popular information, can produce the top x% of popular entities. This value can be changed, more investigation is needed to find an optimum value. To finish:
Closing as popular information finding has been completed. Assigning scores is done by priority manager in #40
This finds information that is popular amongst the sources found in Source Aggregation #17. Once found, individual (and discrete) entities are stored in a Popular Entity Store. This is accessed by the Priority Manager #20.