JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df. #295

Closed olegs closed 2 years ago

olegs commented 2 years ago
[2021-10-15 18:41:30,575: INFO/ForkPoolWorker-4] Searching 1000 most cited publications matching bacteriophages suppressors bacterial immunity, not reviews
[2021-10-15 18:41:32,105: INFO/ForkPoolWorker-4] Found 1 publications in the database
[2021-10-15 18:41:32,106: INFO/ForkPoolWorker-4] Expanding related papers by references
[2021-10-15 18:41:32,184: INFO/ForkPoolWorker-4] Loading publication data
[2021-10-15 18:41:32,195: INFO/ForkPoolWorker-4] Found 1 papers in database
[2021-10-15 18:41:32,196: INFO/ForkPoolWorker-4] Analyzing title and abstract texts
[2021-10-15 18:41:32,197: INFO/ForkPoolWorker-4] Building corpus from 1 papers
[2021-10-15 18:41:32,197: INFO/ForkPoolWorker-4] Processing stemming for all papers
[2021-10-15 18:41:32,217: INFO/ForkPoolWorker-4] Creating global shortest stem to word map
[2021-10-15 18:41:32,217: INFO/ForkPoolWorker-4] Creating stemmed corpus
[2021-10-15 18:41:32,223: ERROR/ForkPoolWorker-4] Task analyze_search_terms[19f2ce56-ea88-49d8-82f8-838ff6935a3c] raised unexpected: ValueError('After pruning, no terms remain. Try a lower min_df or a higher max_df.')
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/user/pysrc/celery/tasks_main.py", line 47, in analyze_search_terms
    analyzer.analyze_papers(ids, query, test=test, task=current_task)
  File "/home/user/pysrc/papers/analyzer.py", line 135, in analyze_papers
    self.corpus, self.corpus_tokens, self.corpus_counts = vectorize_corpus(
  File "/home/user/pysrc/papers/analysis/text.py", line 46, in vectorize_corpus
    counts = vectorizer.fit_transform([list(chain(*sentences)) for sentences in papers_sentences_corpus])
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1221, in fit_transform
    X, self.stop_words_ = self._limit_features(X, vocabulary,
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1092, in _limit_features
    raise ValueError("After pruning, no terms remain. Try a lower"
ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df.
olegs commented 2 years ago

Closing as a duplicate of https://github.com/JetBrains-Research/pubtrends/issues/307