databio / gtars

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
2 stars 1 forks source link

Better tokenization API #13

Closed nleroy917 closed 2 months ago

nleroy917 commented 2 months ago

This PR is still in progress, but now that I am close, I am going to open the PR to track it all. It mostly just implements a better API (in my opinion) of the tokenizers.

TODO:

(No need to wait for anything since this is merging to dev)

nleroy917 commented 2 months ago

I think that this is ready. Will merge.