HazyResearch / hyena-dna

Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena
https://arxiv.org/abs/2306.15794
Apache License 2.0
532 stars 74 forks source link

Clarifying the models available on HF #39

Closed yair-schiff closed 5 months ago

yair-schiff commented 6 months ago

Hi,

On the LongSafari HF space there appear to be 2 copies of each model, one with -hf at the end of the name and one without.

I was wondering what the difference is between these models (other than one being compatible with AutoModel), because despite the names being the same and the variables in the config files looking almost identical (i.e., same d_model and n_layers), they have very different number of parameters. For example,

Which version of these models corresponds to the ones used in the paper experiments? If I am not mistaken, it should be the first one (i.e., the one without -hf in the name)?

yair-schiff commented 6 months ago

@exnx, after digging into the two versions of each model, it appears that the main difference is in how the PositionalEmbedding modules are defined:

That is, in the repo here, the PositionalEmbedding model, has no learnable parameters:

        self.register("z", z, lr=lr_pos_emb)

because in the config files(i.e., in configs/experiment/hg38/hg38_hyena.yaml), lr_pos_emb = 0.0, so the code uses register_buffer (i.e., here)

However, on HF, the version of each model that has -hf in the name uses this modeling code:

    self.z = nn.Parameter(z, requires_grad=True)

This increases the number of parameters for the -hf version of each model, especially for long sequence models.


So I guess my question is which of these would be the "correct" model to compare to and which was used in the paper's experiments?

Rocketknight1 commented 5 months ago

Hi @yair-schiff, I think the version in this repo is more authoritative. This was an error in the HF port - I'll submit a fix soon, and hopefully the two versions should be equivalent after that!

yair-schiff commented 5 months ago

@Rocketknight1, thanks for following up. I should have posted here as well after I did some digging. The two models have equivalent weights. As you mention, I think it was just a small discrepancy in the HF port that set the z parameter to "learnable". Thanks!

Rocketknight1 commented 5 months ago

@yair-schiff No probs! The code for the -hf models should now be updated with z as a buffer instead of a parameter.