Open larsbun opened 1 year ago
Hi @larsbun it looks to me, like you are trying to make use of https://github.com/helpmefindaname/transformer-smaller-training-vocab while having a tokenizer that is not supported.
That said, it should work when you don't set reduce_transformer_vocab=True
on the trainer.train
method.
If you still want to use that library, you can create an issue there and detail your tokenizer class for some support.
To be specific, it's the SequenceTagger in flair which calls the reduce_transformer_vocab,
and I am wondering what it takes to make this tokenizer supported. It is not at all to be clear to me what reduce_transformer_vocab
actually does and what it is necessary, and what's required to do it, etc. I tried setting reduce_transformer_vocab=False
, for trainer.train
, but the result was the same.
Hi @larsbun when you talk about specifics, it would be nice if you would share the version you are using and the code you were running. The SequenceTagger is not running anything, but the trainer does when it is activated. Assuming you are on the latest version, this is either a bug or some issues with the parameters.
Question
Hi,
I am working on using embeddings from a pre-trained model which is not published. When I try to import it as a TransformerWordEmbedding, it fails with this error message:
I tried looking at the code at the relevant places, but there were so many layers of abstraction that I was unable to work it out. I do suspect, however, that there is only a little detail missing from it. How can I find out what this implementation for setting the vocabulary should look like (and thereby fix it?)