frederikkemarin / BEND

Benchmarking DNA Language Models on Biologically Meaningful Tasks
BSD 3-Clause "New" or "Revised" License
96 stars 13 forks source link

A bug in the readme file #60

Closed HelloWorldLTY closed 2 months ago

HelloWorldLTY commented 2 months ago

Hi, I found that the example code in the readme file cannot work:

from bend.embedders import NucleotideTransformerEmbedder

# load the embedder with a valid checkpoint name or path
embedder = NucleotideTransformerEmbedder('InstaDeepAI/nucleotide-transformer-2.5b-multi-species')

# embed a list of sequences
embeddings = embedder.embed(['AGGATGCCGAGAGTATATGGGA', 'CCCAACCGAGAGTATATGTTAT'])
# or just call directly to embed a single sequence
embedding = embedder('AGGATGCCGAGAGTATATGGGA') 

# This requires git LFS and will automatically download the checkpoint, if not already present
from bend.embedders import HyenaDNAEmbedder
embedder = HyenaDNAEmbedder('pretrained_models/hyenadna/hyenadna-tiny-1k-seqlen')

The error is: Traceback (most recent call last): File "/gpfs/radev/project/ying_rex/tl688/BEND/testcode.py", line 4, in from bend.embedders import NucleotideTransformerEmbedder ModuleNotFoundError: No module named 'bend.embedders'

Would you please consider fixing them? Thanks.

HelloWorldLTY commented 2 months ago

Furthermore, DNABERT-2 embedder seems not work:

Traceback (most recent call last):
  File "/gpfs/radev/project/ying_rex/tl688/BEND/testcode.py", line 7, in <module>
    embedder = bend.embedders.DNABert2BertModel('zhihan1996/DNABERT-2-117M')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/models/dnabert2.py", line 569, in __init__
    super(BertModel, self).__init__(config)
  File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1357, in__init__
    raise ValueError(
ValueError: Parameter config in `BertModel(config)` should be an instance of class `PretrainedConfig`. To create a model from a pretrained model use `model = BertModel.from_pretrained(PRETRAINED_MODEL_NAME)`
fteufel commented 2 months ago

Yes, that was wrong indeed. Should have been bend.utils.embedders - fixed now.

As for the latter issue, please use DNABert2Embedder, not DNABert2BertModel.

HelloWorldLTY commented 2 months ago

Thanks a lot