JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
36 stars 2 forks source link

Added keyword selection based on cosine distance for topics description generation #276

Closed ctrltz closed 3 years ago

olegs commented 3 years ago

Can you please add some tests to this method?

ctrltz commented 3 years ago

Sure, most likely on this weekend.

ctrltz commented 3 years ago

Added some tests, please use squash & merge to get rid of unnecessary merges in this branch.

ctrltz commented 3 years ago

Yes, it is. I would also be grateful for your personal feedback/opinion on whether the new method works better or not.

olegs commented 3 years ago

Yes, it is. I would also be grateful for your personal feedback/opinion on whether the new method works better or not.

After some investigation, it turned out, that using cosine-based distance is more consistent with frequencies of terms in graph search. TFIDF is good for topics identification but is biased towards commonly frequent words, which makes it worse in topics description.