JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
36 stars 2 forks source link

Discrepancy between paper in Journals number and papers on click #202

Closed olegs closed 4 years ago

olegs commented 4 years ago

Open "deep learning" at Semantic Scholar. Screenshot 2020-01-18 at 11 39 24 See 3 papers for Nature methods, but on click 6 papers are shown. Screenshot 2020-01-18 at 11 40 04

olegs commented 4 years ago

Small investigation showed that this is due to the fact that not all the papers are assigned with subtopic, i.e. some have -1.

Before dropping journals for papers with missing subtopics:

[2020-01-18 13:03:57,045: WARNING/ForkPoolWorker-1] journal_stats
[2020-01-18 13:03:57,045: WARNING/ForkPoolWorker-1] 262
[2020-01-18 13:03:57,053: WARNING/ForkPoolWorker-1] After drop papers with undefined journal
[2020-01-18 13:03:57,054: WARNING/ForkPoolWorker-1] 255
[2020-01-18 13:03:57,189: WARNING/ForkPoolWorker-1] journal                    comp                      counts  sum
129                                              ArXiv  [-1, 0, 2, 4, 1, 3, 5]  [41, 27, 20, 14, 11, 9, 8]  130
207                                     Nature Methods              [-1, 3, 0]                   [3, 2, 1]    6
137         Brain and nerve = Shinkei kenkyu no shinpo                    [-1]                         [3]    3
92   2018 International Congress on Big Data, Deep ...                 [-1, 1]                      [2, 1]    3
26   2017 25th Signal Processing and Communications...                    [-1]                         [3]    3
62   2018 26th Signal Processing and Communications...                 [-1, 1]                      [2, 1]    3

After:

[2020-01-18 13:03:57,191: WARNING/ForkPoolWorker-1] After drop papers with undefined subtopic
[2020-01-18 13:03:57,192: WARNING/ForkPoolWorker-1] 143
[2020-01-18 13:03:57,326: WARNING/ForkPoolWorker-1] journal                comp                  counts  sum
68                                               ArXiv  [0, 2, 4, 1, 3, 5]  [27, 20, 14, 11, 9, 8]   89
117                                     Nature Methods              [3, 0]                  [2, 1]    3
106                     Journal of High Energy Physics                 [4]                     [2]    2
43   2018 IEEE/ACM International Conference on Comp...              [3, 5]                  [1, 1]    2
49   2018 Second International Conference on Intell...              [1, 2]                  [1, 1]    2

This agrees with the 6 papers for Nature we get when navigate to journals.