czcorpus / InterText_editor

Editor for aligned parallel texts (personal desktop application).
http://wanthalf.saga.cz/intertext
Other
19 stars 1 forks source link

picking a dictionary for hunalign #3

Closed ngawangtrinley closed 5 years ago

ngawangtrinley commented 5 years ago

I can't find a way to select a dictionary for hunalign. It is crucial for the languages I'm working with (tibetan and chinese). Did I miss something in the Settings? In case it's not possible, is it something you would consider adding to the tool?

wanthalf commented 5 years ago

Hello. It is described in the user guide in section 15.4. You can create different profiles to run hunalign with different dictionaries. Each profile defines the parameters to run hunalign with.

The default profile only refers to the "empty.dic" (it is really just an empty file), but you can create different dictionaries in the hunalign directory and create profiles for each of them. When aligning, you can select the proper profile with the dictionary you want.

ngawangtrinley commented 5 years ago

Thanks heaps. The guide is great, the tool flawless. Great job! I'll close the issue.

ngawangtrinley commented 5 years ago

I did a bit more testing and found out the dictionary isn't actually getting loaded properly and isn't improving the alignment. Here's the error message I get. image

Do you have any idea how I could solve this?

Here's my setup: image

wanthalf commented 5 years ago

I don't see any error message, just warnings. How do you know the dictionary is not loaded?

We don't use dictionaries since we did not see any significant improvement when using them for most languages anyway. Hunalign seems to be good enough even without a dictionary. But it may be different for Asian languages.

I suppose you know how to make a dictionary for hunalign: the format and expected contents of the file. The contents also play an important role. It should contain the most frequent words and you need to take care of polysemous words - unsuitable dictionary may possibly even make the results worse, I guess.

ngawangtrinley commented 5 years ago

Thanks for the insight. I kept a very small dictionary and used it on a very short text where it should have been able to join two segments did it use the dictionary. The result was that it still behaved the same as without the dictionary.

We'll just stick to the default aligner. Thank a lot for your quick answers.

wanthalf commented 5 years ago

I cannot speak for hunalign, but it is a statistical tool, which means it is a kind of a "black box" and does not always behave in a predictable or expected way. Many different measures are probably evaluated inside and they may just override the value of the dictionary hint (being just one of them). You may try to contact the authors of hunalign and ask for more advice.