JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

ValueError #307

Closed olegs closed 2 years ago

olegs commented 2 years ago

Try paper search: Semi-supervised peak calling with SPAN and JBR Genome Browser

[2022-01-17 20:43:56,132: INFO/ForkPoolWorker-2] Searching for a publication with title=Semi-supervised peak calling with SPAN and JBR Genome Browser
[2022-01-17 20:43:56,497: INFO/ForkPoolWorker-2] Analyzing 1 paper(s) from Pubmed
[2022-01-17 20:43:56,498: INFO/ForkPoolWorker-2] Expanding related papers by references
[2022-01-17 20:43:56,512: INFO/ForkPoolWorker-2] Loading publication data
[2022-01-17 20:43:56,526: INFO/ForkPoolWorker-2] Found 1 papers in database
[2022-01-17 20:43:56,527: INFO/ForkPoolWorker-2] Analyzing title and abstract texts
[2022-01-17 20:43:56,527: INFO/ForkPoolWorker-2] Building corpus from 1 papers
[2022-01-17 20:43:56,527: INFO/ForkPoolWorker-2] Processing stemming for all papers
[2022-01-17 20:43:56,556: INFO/ForkPoolWorker-2] Creating global shortest stem to word map
[2022-01-17 20:43:56,556: INFO/ForkPoolWorker-2] Creating stemmed corpus
[2022-01-17 20:43:56,566: ERROR/ForkPoolWorker-2] Task analyze_search_paper[08f50970-dd88-4da1-a8b2-86aefa187cad] raised unexpected: ValueError('After pruning, no terms remain. Try a lower min_df or a higher max_df.')
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/user/pysrc/celery/tasks_main.py", line 149, in analyze_search_paper
    return _analyze_id_list(
  File "/home/user/pysrc/celery/tasks_main.py", line 123, in _analyze_id_list
    analyzer.analyze_papers(ids, query, topics, test=test, task=task)
  File "/home/user/pysrc/papers/analyzer.py", line 150, in analyze_papers
    self.corpus, self.corpus_tokens, self.corpus_counts = vectorize_corpus(
  File "/home/user/pysrc/papers/analysis/text.py", line 48, in vectorize_corpus
    counts = vectorizer.fit_transform([list(chain(*sentences)) for sentences in papers_sentences_corpus])
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1221, in fit_transform
    X, self.stop_words_ = self._limit_features(X, vocabulary,
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1092, in _limit_features
    raise ValueError("After pruning, no terms remain. Try a lower"
ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df