databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

Speeding up pepatac for tiny samples #193

Closed nsheff closed 3 years ago

nsheff commented 3 years ago

Right now, there are a few steps in the pipeline that take a long time to run regardless of the size of the sample.

This is problematic because even a minimal test sample takes about 15 minutes to complete. That is one factor that prevents us from doing simple CI/CD tests because they would be prohibitively compute intensive.

I think it would be worth looking a bit deeper into this to see what it would take to make the pipeline finish in under a minute when the sample inputs are very very small.

jpsmith5 commented 3 years ago

There are, and I believe will be, long-standing issues that prevent this from occurring, even for small samples. Predominantly, wig/bigWig generation is the most time-consuming step even with a small sample. We've investigate c driven versions of this process but consistently hit bugs that were inconsistent, with different runs yielding different outcomes with no clear point in the process causing such issues. Closing for now, but will continue investigating as time permits.