dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.37k stars 307 forks source link

ValueError: n_samples=4 should be >= n_clusters=40 #128

Closed chagri closed 2 years ago

chagri commented 2 years ago

I am using SentenceBERT summarizer for the following body:

As you begin planning your campaign, you may have some questions about whether or not Indiegogo allows the type of campaign you want to run. In general, for-profit and nonprofit campaigns are allowed on Indiegogo. Keep in mind as you create your perks , Indiegogo has rules about what you can and can't offer . We also recommend reading through our Terms of Use and Community Guidelines for more information on what we allow on Indiegogo. What is allowed on Indiegogo? For-profit campaigns Campaigns benefitting nonprofit organizations or nonprofit beneficiaries Campaigns for products Anything within "Community Projects" Educational campaigns in the Tech and Innovation category Please read through our Terms of Use and Community Guidelines for more information on what we allow on Indiegogo.

model = SBertSummarizer('paraphrase-MiniLM-L6-v2')
result = model(body, 10, return_as_list=True)

I get this error:

Traceback (most recent call last): File "extract_key_sentences.py", line 195, in run_sample() File "extract_key_sentences.py", line 179, in run_sample key_sents, key_phrases, extractive_summary_sentences = key_sent_extractor.extract_key_sents(text_list, title, question) File "extract_key_sentences.py", line 100, in extract_key_sents extractive_summary = self.summarizer('\n'.join(article_body_text_list), self.summarizer_num_sentences_to_extract, return_as_list=True) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/summarizer/summary_processor.py", line 235, in call use_first, algorithm, num_sentences, return_as_list) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/summarizer/summaryprocessor.py", line 202, in run sentences, = self.cluster_runner(sentences, ratio, algorithm, use_first, num_sentences) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/summarizer/summary_processor.py", line 120, in cluster_runner hidden, algorithm, random_state=self.random_state).cluster(ratio, num_sentences) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/summarizer/cluster_features.py", line 149, in cluster model = self._get_model(k).fit(self.features) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/sklearn/cluster/_kmeans.py", line 984, in fit self._check_params(X) File "/Users/name/anaconda3/envs/env/lib/python3.7/site-packages/sklearn/cluster/_kmeans.py", line 812, in _check_params raise ValueError(f"n_samples={X.shape[0]} should be >= " ValueError: n_samples=4 should be >= n_clusters=20.

Can you please help address this?

chagri commented 2 years ago

Didn't pass "num_sentences" argument and used the value directly.

result = model(body, num_sentences=10, return_as_list=True)