Your provided Quick Start Code is not working with the following errors.
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"].to(device)
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
# embedding with mean pooling
embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768
# embedding with max pooling
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768
Erros Logs:
Traceback (most recent call last):
File "/workspace/work/CLIP/DNA/DNA_emb.py", line 22, in <module>
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
cls.register(config.__class__, model_class, exist_ok=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 587, in register
raise ValueError(
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.dd10f74f0e90735d02a27603e56467761893e8f9.configuration_bert.BertConfig'>. Fix one of those so they match!
I managed to make it to run by using BertConfig as below:
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
config = BertConfig.from_pretrained("zhihan1996/DNABERT-2-117M")
model = AutoModelForMaskedLM.from_config(config).to(device)
Yet, the output embedding dimension is 4096 instead of 768.
Hi DNABert Team,
Your provided Quick Start Code is not working with the following errors.
Erros Logs:
I managed to make it to run by using BertConfig as below:
Yet, the output embedding dimension is 4096 instead of 768.
Could you help me out? Thanks a lot.