chupinmaxime / wordcloud

Other
5 stars 1 forks source link

Support for compound words #4

Open tferr opened 10 months ago

tferr commented 10 months ago

@chupinmaxime, thanks a lot for this. Really useful.

I don't seem to be able to include compound words such as "french fries" or "United States". Only the first word seems to be parsed. This appears to be the case even for hyphenated words (e.g.," check-in" appears as "check"). Surprisingly, the second word remains omitted when the hyphen is replaced by an underscore.

I am using the \wordcloudFile macro. Is there a way around this, perhaps by escaping the compound word with ""?

tferr commented 10 months ago

Also, if compound words ought to be supported, it would be useful to have an option for allowing parts of compound words to be separated by line breaks

chupinmaxime commented 10 months ago

Thank you. Ok. I’ll try to allow adding a list of «compound words» to keep during the parsing of the file. Does this seem ok for you? Maybe, the breaks will be ignored : For a complete text file, such cases will not appear significantly, no?

tferr commented 10 months ago

That would work. The text file I was parsing was programmatically generated, which is very specific to my use case. Meanwhile, I realized that in LuaTeX the \wordcloud[]{(word,weight);} syntax allows for compound words, so I can use that instead.