eole-nlp / eole

Open language modeling toolkit based on PyTorch
https://eole-nlp.github.io/eole
MIT License
24 stars 6 forks source link

[fix] Allow to build_vocab with full train config, patch vocab validation #49

Closed francoishernandez closed 3 weeks ago

francoishernandez commented 3 weeks ago

By default, all configs are set with extra="allow"``. To facilitate building the vocab with a full train config, e.g. inwiki_103orwmt17, we'll be more permissive forBuildVocabConfig`.

Note: pydantic 2.8.0 upgrade triggers errors that were not raised before. This outlined some remaining issues around some vocab/data validation logic. This PR provides some patches, but this logic might be reviewed more in depth at some point. A somewhat cleaner way might be to nest the VocabConfig in a vocab key, and pop it when needed (like already done for data in some places).