biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

Bam support #44

Open endrebak opened 7 years ago

endrebak commented 7 years ago

I think it is a bad idea for the below reasons. Feel free to suggest solutions:

You will probably rerun the analyses many times. Having to run a time-consuming conversion step (the most time-consuming one in the algorithm) each time would be silly. It is also IO-intensive so parallell execution would not help much.

I am not just writing epic but a lot of helper scripts for ChIP-Seq and differential ChIP Seq. Adding a conversion step to bed in all of these before running the scripts would be a waste.

Also, where should I store the temporary bed files? Overflowing /tmp/ dirs is an eternal issue.

If I were to stream the data to bed using pipes, epic would not be fast anymore. I get a massive speedup from multiple cores if I use text files, presumably because the system knows it has the file in memory already. This is not the case if I start the pipe with bamToBed blabla | ...

There are many things that can go wrong when converting bam to bed, due to wonky bam files. I would get a bunch of github issues about "epic not being able to use my bam files" if I were to silently convert to bed within my programs.

endrebak commented 7 years ago

I guess the best way of adding bam support would be to do the conversion before running the script with a warning that I think using bams instead of beds is suboptimal. If the conversion fails I'll throw an exception informing the user that the onus is on them to convert their wonky bam-files to bed.

endrebak commented 7 years ago

My solution: if the input files are called path/to/file.bam, create a file path/to/file.bed. Do not delete it afterwards.