databio / gtars

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
3 stars 2 forks source link

Add ability for rust uniwig to create output files from input .bam files #30

Open donaldcampbelljr opened 1 month ago

donaldcampbelljr commented 1 month ago

Some work accomplished with PR #40.

We would like to use this code as a drop in replacement for bamSitesToWig.py from PEPATAC: https://github.com/databio/pepatac/blob/master/tools/bamSitesToWig.py

bamSitesToWig.py creates three files as output:

    • [ ] exact.bw
    • [ ] smoothed.bw
    • [ ] shift.bed

Currently, uniwig can take an input file of either: bed, narrowPeak, bam

and create an output of: wig npy bedGraph bw (via an intermediate bedGraph conversion)

Some items to accomplish for this task:

Nice to have:

donaldcampbelljr commented 4 days ago

Working proof of concept in #40 , however, it uses an intermediate BedGraph file written to disk.

Therefore, we are exploring an alternative method in #47 which streams values directly to bigtools bw writer.

However, some challenges remain, namely: