How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings.

Could you, please, help me with using HyenaDNA for inference? I'm trying to produce embeddings for a series of long sequences (about 1500 sequences of up to 400,000 nucleotides). When I try running the "single example" method from colab notebook, it can only be run one time before CUDA memory is filled (torch.cuda.empty_cache() doesn't help) and colab session needs to be restarted. Most likely it is necessary to use the "Batch example" method but it seems to be designed around the GenomicBenchmarks dataset. Is there any way to repurpose it towards user-input data? Effectively I have a list of DNA sequences strings; how do I pass them to the model correctly in batch format?

HazyResearch / hyena-dna

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55