chupinmaxime / wordcloud

Other
5 stars 1 forks source link

Support for compound words #4

Open tferr opened 1 year ago

tferr commented 1 year ago

@chupinmaxime, thanks a lot for this. Really useful.

I don't seem to be able to include compound words such as "french fries" or "United States". Only the first word seems to be parsed. This appears to be the case even for hyphenated words (e.g.," check-in" appears as "check"). Surprisingly, the second word remains omitted when the hyphen is replaced by an underscore.

I am using the \wordcloudFile macro. Is there a way around this, perhaps by escaping the compound word with ""?

tferr commented 1 year ago

Also, if compound words ought to be supported, it would be useful to have an option for allowing parts of compound words to be separated by line breaks

chupinmaxime commented 1 year ago

Thank you. Ok. I’ll try to allow adding a list of «compound words» to keep during the parsing of the file. Does this seem ok for you? Maybe, the breaks will be ignored : For a complete text file, such cases will not appear significantly, no?

tferr commented 1 year ago

That would work. The text file I was parsing was programmatically generated, which is very specific to my use case. Meanwhile, I realized that in LuaTeX the \wordcloud[]{(word,weight);} syntax allows for compound words, so I can use that instead.