databio / gtars

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
2 stars 1 forks source link

Implement a basic fragment file tokenizer #23

Closed nleroy917 closed 1 month ago

nleroy917 commented 1 month ago

This will take a fragemnts.tsv.gz file and convert it into a series of .gtok files on disk that can be used for training.

TODO: