frederikkemarin / BEND

Benchmarking DNA Language Models on Biologically Meaningful Tasks
BSD 3-Clause "New" or "Revised" License
95 stars 14 forks source link

Cannot run Hyena-DNA #62

Closed HelloWorldLTY closed 2 months ago

HelloWorldLTY commented 2 months ago

Hi, I tried to embed the largest model of HyenaDNA. However, I received such an error:

Traceback (most recent call last):
  File "/gpfs/radev/project/ying_rex/tl688/BEND/testcode_hyenadna.py", line 10, in <module>
    embedder = bend.embedders.HyenaDNAEmbedder('LongSafari/hyenadna-large-1m-seqlen')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/utils/embedders.py", line 63, in __init__
    self.load_model(*args, **kwargs)
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/utils/embedders.py", line 740, in load_model
    self.tokenizer = CharacterTokenizer(
                     ^^^^^^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/models/hyena_dna.py", line 972, in __init__
    super().__init__(
  File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 432,in __init__
    super().__init__(**kwargs)
  File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line1574, in __init__
    raise AttributeError(f"{key} conflicts with the method {key} in {self.__class__.__name__}")
AttributeError: add_special_tokens conflicts with the method add_special_tokens in CharacterTokenizer

It happens in the code:

embedder = bend.embedders.HyenaDNAEmbedder('LongSafari/hyenadna-large-1m-seqlen')

Thanks a lot.

fteufel commented 2 months ago

Hi, I've just pushed an update to fix this.