MAGICS-LAB / DNABERT_2

[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Apache License 2.0
212 stars 49 forks source link

config_class error #49

Closed yudizhangzyd closed 9 months ago

yudizhangzyd commented 9 months ago

model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True) triggers an error:

The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

Any idea how to by pass this?

Another question would be after obtain the token embeddings, any way to convert it back to embeddings for each nucleotide? Thanks!

Zhihan1996 commented 9 months ago

Can you try to install transformers with pip install transformers==4.29?

yudizhangzyd commented 9 months ago

Thanks, that would work. Can you also answer my second question?

Zhihan1996 commented 9 months ago

Sure. Sorry for missing your second question. I think it is non-trivial to obtain nucleotide embedding given the token embeddings. One thing I can think of is using the token embedding plus the embedding of A/T/C/G accordingly (taken from the word embedding layer) to serve as nucleotide embedding, but I haven't experimented with it so I don't know whether it is a good solution.