JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

Empty similarity graph problem during topics evolution #247

Closed olegs closed 2 years ago

olegs commented 3 years ago
[2021-03-07 22:31:27,228: INFO/MainProcess] Received task: analyze_search_terms[predefined_29c2dee53c008e3df9696ff046a8e036]  
[2021-03-07 22:31:27,232: INFO/ForkPoolWorker-2] Analyzing search query
[2021-03-07 22:31:27,236: INFO/ForkPoolWorker-2] Searching 500 most cited publications matching COVID-19,2019-nCoV,SARS-CoV,MERS-CoV
[2021-03-07 22:31:27,237: INFO/ForkPoolWorker-2] Preferring non review papers
[2021-03-07 22:31:38,400: INFO/ForkPoolWorker-2] Found 500 publications in the local database
[2021-03-07 22:31:38,401: INFO/ForkPoolWorker-2] Expanding related papers by references
[2021-03-07 22:31:51,053: INFO/ForkPoolWorker-2] Expanded to 600 papers
[2021-03-07 22:31:51,055: INFO/ForkPoolWorker-2] Loading publication data
[2021-03-07 22:31:51,107: INFO/ForkPoolWorker-2] Analyzing title and abstract texts
[2021-03-07 22:31:51,108: INFO/ForkPoolWorker-2] Building corpus from 540 papers
[2021-03-07 22:31:59,833: INFO/ForkPoolWorker-2] Processing texts similarity
[2021-03-07 22:32:00,190: INFO/ForkPoolWorker-2] Loading citations statistics by year
[2021-03-07 22:32:01,442: INFO/ForkPoolWorker-2] Found 2479 records of citations by year
[2021-03-07 22:32:01,466: INFO/ForkPoolWorker-2] Loading citations data
[2021-03-07 22:32:01,638: INFO/ForkPoolWorker-2] Found 2926 citations between papers
[2021-03-07 22:32:01,639: INFO/ForkPoolWorker-2] Building citation graph
[2021-03-07 22:32:01,996: INFO/ForkPoolWorker-2] Built citation graph - 511 nodes and 2926 edges
[2021-03-07 22:32:01,997: INFO/ForkPoolWorker-2] Calculating co-citations for selected papers
[2021-03-07 22:32:06,927: INFO/ForkPoolWorker-2] Found 105502 co-cited pairs of papers
[2021-03-07 22:32:06,942: INFO/ForkPoolWorker-2] Filtering co-citations with min threshold 2
[2021-03-07 22:32:06,943: INFO/ForkPoolWorker-2] Filtered 84809 co-cited pairs of papers
[2021-03-07 22:32:06,943: INFO/ForkPoolWorker-2] Processing bibliographic coupling for selected papers
[2021-03-07 22:32:07,055: INFO/ForkPoolWorker-2] Found 29733 bibliographic coupling pairs of papers
[2021-03-07 22:32:07,056: INFO/ForkPoolWorker-2] Filtering bibliographic coupling with min threshold 2
[2021-03-07 22:32:07,059: INFO/ForkPoolWorker-2] Filtered 13477 bibliographic coupling pairs of papers
[2021-03-07 22:32:07,059: INFO/ForkPoolWorker-2] Building papers similarity graph
[2021-03-07 22:32:07,326: INFO/ForkPoolWorker-2] Built similarity graph - 540 nodes and 86784 edges
[2021-03-07 22:32:07,327: INFO/ForkPoolWorker-2] Extracting topics from paper similarity graph
[2021-03-07 22:32:11,718: INFO/ForkPoolWorker-2] Computing topics descriptions
[2021-03-07 22:32:26,934: INFO/ForkPoolWorker-2] Performing PageRank analysis
[2021-03-07 22:32:27,015: INFO/ForkPoolWorker-2] Identifying top cited papers
[2021-03-07 22:32:27,017: INFO/ForkPoolWorker-2] Identifying top cited papers for each year
[2021-03-07 22:32:27,073: INFO/ForkPoolWorker-2] Identifying hot papers of the year
[2021-03-07 22:32:27,190: INFO/ForkPoolWorker-2] Finding popular authors
[2021-03-07 22:32:28,153: INFO/ForkPoolWorker-2] Finding popular journals
[2021-03-07 22:32:28,204: INFO/ForkPoolWorker-2] Extracting numbers from publication abstracts
[2021-03-07 22:32:48,521: INFO/ForkPoolWorker-2] Studying evolution of topics in 1990 - 2021
[2021-03-07 22:32:48,522: INFO/ForkPoolWorker-2] Processing year 2011
[2021-03-07 22:32:48,933: INFO/ForkPoolWorker-2] Building papers similarity graph
[2021-03-07 22:32:48,943: INFO/ForkPoolWorker-2] Built similarity graph - 75 nodes and 1630 edges
[2021-03-07 22:32:48,950: INFO/ForkPoolWorker-2] Extracting topics from paper similarity graph
[2021-03-07 22:32:48,993: INFO/ForkPoolWorker-2] Processing year 2001
[2021-03-07 22:32:49,607: INFO/ForkPoolWorker-2] Building papers similarity graph
[2021-03-07 22:32:49,611: INFO/ForkPoolWorker-2] Built similarity graph - 2 nodes and 1 edges
[2021-03-07 22:32:49,611: INFO/ForkPoolWorker-2] Extracting topics from paper similarity graph
[2021-03-07 22:32:49,613: INFO/ForkPoolWorker-2] Processing year 1991
[2021-03-07 22:32:49,992: INFO/ForkPoolWorker-2] Building papers similarity graph
[2021-03-07 22:32:49,994: INFO/ForkPoolWorker-2] Built similarity graph - 0 nodes and 0 edges
[2021-03-07 22:32:49,995: INFO/ForkPoolWorker-2] Extracting topics from paper similarity graph
[2021-03-07 22:32:50,003: INFO/ForkPoolWorker-2] Generating evolution topics description by top cited papers
[2021-03-07 22:32:50,004: INFO/ForkPoolWorker-2] Generating topics descriptions for year 1991
[2021-03-07 22:32:50,009: ERROR/ForkPoolWorker-2] Task analyze_search_terms[predefined_29c2dee53c008e3df9696ff046a8e036] raised unexpected: ValueError('Found array with 0 sample(s) (shape=(0, 1102)) while a minimum of 1 is required.')
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/user/pysrc/celery/tasks_main.py", line 31, in analyze_search_terms
    analyzer.analyze_papers(ids, query, task=current_task)
  File "/home/user/pysrc/papers/analyzer.py", line 369, in analyze_papers
    self.evolution_kwds = self.topic_evolution_descriptions(
  File "/home/user/pysrc/papers/analyzer.py", line 933, in topic_evolution_descriptions
    evolution_kwds[col] = get_evolution_topics_description(
  File "/home/user/pysrc/papers/utils.py", line 184, in get_evolution_topics_description
    tfidf = compute_comps_tfidf(df, comps, corpus_counts, ignore_comp=-1)
  File "/home/user/pysrc/papers/utils.py", line 220, in compute_comps_tfidf
    return compute_tfidf(terms_freqs_per_comp)
  File "/home/user/pysrc/papers/utils.py", line 225, in compute_tfidf
    tfidf = tfidf_transformer.fit_transform(counts)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/base.py", line 571, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1450, in fit
    X = check_array(X, accept_sparse=('csr', 'csc'))
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/utils/validation.py", line 583, in check_array
    raise ValueError("Found array with %d sample(s) (shape=%s) while a"
ValueError: Found array with 0 sample(s) (shape=(0, 1102)) while a minimum of 1 is required.
olegs commented 2 years ago

Closing as obsolete, since we rely on embeddings.