Natooz / MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶
https://miditok.readthedocs.io/
MIT License
693 stars 84 forks source link

Defining Custom Tokens #171

Closed MikeMpapa closed 5 months ago

MikeMpapa commented 6 months ago

Hi and thanks for the awesome work! If I want to modify a tokenization scheme and add custom tokens to the vocabulary related to MIDI metadata. How could I do that?

Natooz commented 6 months ago

Hi, thank you for these nice words. The tokenizer.add_to_vocab method should be what you are looking for. It allows to add custom tokens to the vocabulary. The tokenizer implemented in MidiTok will however not use them, this is up to you to add them at the appropriate indexes in the token sequences produced. Alternatively, you can also subclass one of the tokenizer class and override the required methods to potentially inject your custom tokens more easily at the right places.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.