frederikkemarin / BEND

Benchmarking DNA Language Models on Biologically Meaningful Tasks
BSD 3-Clause "New" or "Revised" License
95 stars 14 forks source link

Skip bad embeddings #35

Closed fteufel closed 1 year ago

fteufel commented 1 year ago

Embedding code isn't perfect. GENA-LM upsampling right now does not work well with large N segments as they do weird tokenization. This skips bad embeddings and prints a warning.