Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
3
stars
2
forks
source link
Release `v0.0.13` -- Add fragment file tokenizer #24
This PR adds a new FragmentTokenizer which will spit out .gtok files directly for barcoded cells inside a fragments.tsv.gz file. It's got the ability to filter cells, too, if you have prior knowledge of high-quality versus low-quality cells. This drastically speeds up the tokenization process too.
I've added a super simple python implementation in the bindings too:
This PR adds a new
FragmentTokenizer
which will spit out.gtok
files directly for barcoded cells inside afragments.tsv.gz
file. It's got the ability to filter cells, too, if you have prior knowledge of high-quality versus low-quality cells. This drastically speeds up the tokenization process too.I've added a super simple python implementation in the bindings too: