Document report card - word cloud

agaulton commented 7 years ago

A lot of these don't seem too informative, with only one term (and often doesn't really seem the most relevant one) e.g., https://chembl-glados.herokuapp.com/document_report_card/CHEMBL1177698/

Sometimes clicking on the term retrieves only the document you started with.

Also should probably not show that section if the cloud is empty e.g., https://chembl-glados.herokuapp.com/document_report_card/CHEMBL1201862/

Not clear the relationship with the Related Documents section (which currently has fake data) - is it the same data?

mnowotka commented 7 years ago

It's not the same data. This one is implemented using textrank algorithm that extracts most relevan keywords from abstracts. Then the network of documents sharing common keywords is created. Another way would be to use LSA/LSI algorithms from gensim library - we started discussing this but never got time to have a proper look.

nclopezo commented 5 years ago

Closing because of https://github.com/chembl/GLaDOS/issues/1073#issuecomment-479805852

chembl / GLaDOS

Document report card - word cloud #343