JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

Terms search in SS does not seem to work great #292

Closed ctrltz closed 2 years ago

ctrltz commented 2 years ago

While looking for examples of queries where PubTrends works well on Semantic Scholar data, I became more and more sure that the search itself has problems.

For example, with the terms 'reinforcement learning' more than 500 out of 1000 papers have neither the abstract nor any citations, and the top-cited paper 'Reinforcement learning' by Barto (https://www.semanticscholar.org/paper/0e3c001c3b89d35006512d1e168d82636d58a067) has 2208 citations.

At the same time, it is possible to run the paper analysis for the paper 'Playing Atari with Deep Reinforcement Learning' (https://www.semanticscholar.org/paper/2319a491378867c7049b3da055c5df60e1671158), which has 4147 citations as displayed in PubTrends.

by_citations = 'count DESC NULLS LAST' - this line is used in pysrc.papers.db.ss_postgres_loader:65 to sort by citations, but it looks a bit weird because matview with citation counts is referred to as C in the full query, so should not it be by_citations = 'C.count DESC NULLS LAST'?

Otherwise the issue might somehow be caused by the text search itself. Unfortunately, I am unable to debug it locally at the moment.

olegs commented 2 years ago

Syntax should be fine , since Publications doesn't have count column count is resolved to matview field in join result. I've launched matview update manually, to ensure that we don't have any issues with counts.

 do
$$
begin
IF exists (select matviewname from pg_matviews where matviewname = 'matview_sscitations') THEN
  refresh materialized view matview_sscitations;
END IF;
end;
$$;
olegs commented 2 years ago

Indeed, there was a problem with SS search, it changed significantly after manual materialized view update.

olegs commented 2 years ago

Example: https://pubtrends.net/result?query=reinforcement%20learning&source=Semantic%20Scholar&limit=1000&sort=Most%20Cited&jobid=4cd9be54-5e5c-4607-887e-82e13d1ff7c5

Screenshot 2021-10-14 at 14 58 12
ctrltz commented 2 years ago

Cool! I have also just checked this query, looks much better now.

In fact, this query might represent the capabilities of our tool quite nicely. Below you can see how the topics evolved over time, and I can make several conclusions from it:

And I am pretty sure that many more nice examples will be available now :)

image

ctrltz commented 2 years ago

Manual update of materialized view helped, closing the issue.