HazyResearch / hyena-dna

Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena
https://arxiv.org/abs/2306.15794
Apache License 2.0
532 stars 74 forks source link

Pretraining runtimes from the paper #46

Open sgalkina opened 5 months ago

sgalkina commented 5 months ago

Hi! Great work, and also great youtube presentation, thanks for making that public.

I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1.6M parameters. When I pretrain on my dataset the model with 3.3M parameters (input size 16k, 3 Hyena layers, emb.dim. 256) it takes me around 16 hours for the dataset with only 21000 samples. Anything is wrong with my setup? Could you please specify more explicitly, what data size went into the Table A.2? Like, how many samples of which sequence length with what batch size. And if it's possible to tell, what share of nucleotides of human genome the pretrained model (like, the one with the batch size 32k) ended up seeing?

Thank you for the nice work!

exnx commented 4 months ago

We only report in Table A.2 pretraining time for the tiny 2 layer, d_model=256, seq_len=1k model for the Nucleotide Transformer datasets. In general, the bigger the model, or the longer the sequence, the longer the training time. In section A.1 of appendix, you'll see that the 1M context length model (our biggest model) was trained for 4 weeks.