THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
BSD 3-Clause "New" or "Revised" License
703 stars 197 forks source link

confuse about vocab.py/get_control_mapping #52

Closed chenvega closed 5 years ago

chenvega commented 5 years ago

Great project! Thanks for your contribution. When I read through the whole project. I am confused about the get_control_mapping method on vocab.py. This method seems like to convert the vocabulary table as a list to a vocabulary mapping as a dictionary. But the parameter "symbols" is not used in this method since the vocabulary table already has , and . Even those symbols do not exists in vocabulary table, this method can not add those to the mapping dictionary.

Playinf commented 5 years ago

@chenvega get_control_mapping method is used to convert control symbols (e.g. , , ) to numeric ids. For example, we can use params.mapping[params.eos] to find eos_id, which is useful during decoding.