Open KonradHoeffner opened 5 years ago
P.S.: I found a workaround to use the invisible character U+2000 (En Quad) but my phrases now all have the same size.
Can you please explain the "same size" issue?
Can you try the development version of wordcloud? I feel like we came across the issue before and hope we fixed it. If not I'll look into it.
I still get the same error, I tried the following regex to include composition nouns: regexp = r"(?<=')(?:\w+.?\w*)(?=')|(?:\w[\w']+)"
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-11-3ef17eac3749> in <module>
15 random_state=42,
16 regexp=regexp,
---> 17 ).generate(str(result))
18
19 print(wordcloud)
~\.conda\envs\ml\lib\site-packages\wordcloud\wordcloud.py in generate(self, text)
617 self
618 """
--> 619 return self.generate_from_text(text)
620
621 def _check_generated(self):
~\.conda\envs\ml\lib\site-packages\wordcloud\wordcloud.py in generate_from_text(self, text)
598 self
599 """
--> 600 words = self.process_text(text)
601 self.generate_from_frequencies(words)
602 return self
~\.conda\envs\ml\lib\site-packages\wordcloud\wordcloud.py in process_text(self, text)
575
576 if self.collocations:
--> 577 word_counts = unigrams_and_bigrams(words, self.normalize_plurals)
578 else:
579 word_counts, _ = process_tokens(words, self.normalize_plurals)
~\.conda\envs\ml\lib\site-packages\wordcloud\tokenization.py in unigrams_and_bigrams(words, normalize_plurals)
54 # collocation detection (30 is arbitrary):
55 word1 = standard_form[bigram[0].lower()]
---> 56 word2 = standard_form[bigram[1].lower()]
57
58 if score(count, counts[word1], counts[word2], n_words) > 30:
KeyError: 'testing'
Thanks for the report. I won't have time to work on this for now, but feel free to investigate and send a PR.
I would like to work on this issue. I am a Masters student in BITS Pilani. It would be really helpful if I could get some kind of approval from the owner/author.
@tirth78 sure, go for it!
To make it work you need to set collocations=False
, as it assumes spaces are used to separate words.
collocations=False
results in error: unrecognized arguments: collocations=False
, however --no_collocations
works!
I will however keep this issue open and let @amueller decide whether this counts as solved or not.
Ideally, this would be enabled automatically if the regular expression includes a space and not crash.
Tested using the newest version 1.9.2 from the Arch Linux python-wordcloud package.
Indeed if you use the CLI, it is --no_collocations
and collocations=False
if you use the library. Thanks for the validation.
Description
word_cloud
crashes when the regular expression includes a space.Steps/Code to Reproduce
Create test.txt with:
wordcloud_cli --imagefile test.png --regexp "\w[\w]+" --text test.txt
works finewordcloud_cli --imagefile test.png --regexp "\w[\w ]+" --text test.txt
crashes with: