LanguageMachines / ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
https://languagemachines.github.io/ucto
GNU General Public License v3.0
65 stars 13 forks source link

Searching config files fails if $HOME is unaccessible #97

Closed proycon closed 1 month ago

proycon commented 1 month ago

I came across an edge-case where ucto ran as user www-data (which has no $HOME), but the variable $HOME was still set to /root. This means that ucto was trying to find config files in /root/.config (localConfigDir), but because this was unaccessible (permission denied, and in fact the whole thing didn't exist in the first place). Ucto exited with an error rather than falling back to defaultConfigDir (e.g. /usr/share/ucto) which did exist:

Permission denied [/root/.config/ucto/tokconfig-nld]

Proposed solution: no hard failure on such filesystem errors during configuration discovery if there are fallback files to try.

Workaround: Make sure $HOME is either valid simply blank, then ucto works fine.