JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

Infinitely pending tasks, or is search in Semantic Scholar unstable? #285

Closed ctrltz closed 2 years ago

ctrltz commented 2 years ago

It is not always possible to retrieve the results from the provided link:

https://pubtrends.net/paper?source=Semantic%20Scholar&jobid=96584252-e074-42a7-894c-8a55fb3984ba&limit=&sort=&key=doi&value=10.1063/5.0021420

Once you have shared it with me, I was able to access the results (most likely because the job results were still stored in Redis), but my colleague was unable to get the results on the next day - "it says "Task is in queue“ (0%) since about 15 minutes."

Today I have tried once again to launch paper analysis for doi=10.1063/5.0021420, and observed the same behavior - "Task is in queue" for 15+ minutes. Meanwhile, I was able to successfully run another Pubmed search in the second tab.

Does it mean that tasks are processed in the random order? Or was the paper analysis already assigned to one of the busy workers, and another one became free later?

Anyway, launching something for Semantic Scholar quite often is problematic for me - probably due to large amount of citations. For the link above, there was a paper with about 27k citations, which might cause significant time delays.

olegs commented 2 years ago

Well, this looks like some error in tasks management, rather than delays in analysis itself, will check it soon. Thank you for reporting this.

olegs commented 2 years ago

Small investigation showed that the task itself is being executed, though application fails at correct state detection. It appears in local active tasks output.

{'celery@86c24e74a828': [{'acknowledged': True,
                          'args': ['Semantic Scholar',
                                   'doi',
                                   '10.1063/5.0021420',
                                   False],
                          'delivery_info': {'exchange': '',
                                            'priority': 0,
                                            'redelivered': None,
                                            'routing_key': 'celery'},
                          'hostname': 'celery@86c24e74a828',
                          'id': '96584252-e074-42a7-894c-8a55fb3984ba',
                          'kwargs': {},
                          'name': 'analyze_search_paper',
                          'time_start': 1633965667.7976015,
                          'type': 'analyze_search_paper',
                          'worker_pid': 19}]}
olegs commented 2 years ago

Updated version will be deployed soon.

olegs commented 2 years ago

At the moment Semantic Scholar doesn't have doi index built, so lookup by doi can be slow. See #286