Closed Vlad-Dembrovskyi closed 2 years ago
Brittany to research on sumner
@Vlad-Dembrovskyi (example bam files) 16G LIB11_Luminal/LIB11_Luminal.Aligned.sortedByCoord.out.bam 27G LIB1_Luminal/LIB1_1_Luminal.Aligned.sortedByCoord.out.bam 28G LIB5_Luminal/LIB5_2_Luminal.Aligned.sortedByCoord.out.bam 28G LIB7_Luminal/LIB7_2_Luminal.Aligned.sortedByCoord.out.bam 25G LIB9_Luminal/LIB9_3_Luminal.Aligned.sortedByCoord.out.bam
Task 1: research bam compression options. Since BAM is an already compressed file, there is not so much room for compression. See https://www.biostars.org/p/420404/ for example. Options: convert to CRAM, or use a more efficient compressor than gzip. Main limiting factor - cpu time. If it take too long to reduce size by 10-20%, then it may be not worth it. To test different options with big bam files. Checkout also https://academic.oup.com/bioinformatics/article/37/16/2225/6135077
Task 2: if testing shows meaningful compression - implement the compression in the end of processes that produce bams so that only cmpressed bams are saved in results folder. This has to be controlled by an optional parameter, that is by default true.
If it is worth in space (reduces a lot) then go for it.
Add a step to zip file in the end of Star process
If we start from the second part of pipeline we start from StringTie - add a conditional unzipping (remember we also need bai)