allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.
https://allenai.github.io/dolma/
Apache License 2.0
894 stars 90 forks source link

tagger_modules do not work in current git version #128

Closed peterbjorgensen closed 6 months ago

peterbjorgensen commented 6 months ago

The modules show up with dolma list --tagger_modules mypackage.mymodule but it crashes if you do dolma tag --tagger_modules mypackage.mymodule ... The problem is that the tagger modules are not loaded before this part of the code which instantiates the taggers by name https://github.com/allenai/dolma/blob/e7657e473ec3b46f2c98cd8f61dfab3955ba4755/python/dolma/core/runtime.py#L428-L433

This means the dolma tag commands crashes with ValueError: Unknown tagger mytagger ...

soldni commented 6 months ago

Excellent catch!! fixed in main.