Open Ying-Zhou-428 opened 3 years ago
Perhaps the IO on your system is extremely slow or you have a very large number of small contigs? Those are the most common causes of poor performance.
Thank you very much for your reply. The bamCoverage was run on a high performance computing platform. So can I assume the IO system is OK for this amount of calculation? This might sound stupid, but I do not know how to check or arrange contigs. Could you please specify it? Thanks a lot.
Run samtools view -H
on the BAM file and count the number of lines starting with @SQ. That's the number of contigs in your genome. If it's a high number (e.g., 7000, or 50000) then that's the reason for the poor performance.
Thank you very much. The contig number of the BAM file is around 500, so I think it is not a very large number. What do you think if I random sampling 30% of the total sequencing reads and do the bamCoverge?
And I found bamCoverage can handle files of several dozens of Gs. So my files are not too large to be processed. The temporary bam.bai files were produced within several minutes. It seemed like the process was stuck at the bam.bai stage.
I haven't a clue why things are so slow on your system.
Yeah, I understand. Thank you very much for your help.
Hi everyone,
I use bamCoverage (bamCoverage version: 2.5.3, python version: 2.7.5) to convert sorted bam files to bigwig files for visualization in UCSC genome browser. It usually takes less than 1 h to covert a 2 G bam file to bigwig file. I recently used the same command to process a 8.2 G bam file and noticed bamCoverage was terribly slow. It was running for several days and no bigwig files were generated. I noticed bam.bai files were generated with several minutes.
I have changed the binsize to 100 and given the max threads and up to 20 cpu, but nothing really sped up the process.
The command I use bamCoverage -bs 100 -b input.sort.bam -p max -o output.norm.bw --normalizeUsingRPKM
The log normalization: RPKM minFragmentLength: 0 verbose: Falses out_file_for_raw_data: None numberOfSamples: None bedFile: None bamFilesList: ['/processed_NGS/input.sort.bam'] ignoreDuplicates: False numberOfProcessors: 20 samFlag_exclude: None save_data: False stepSize: 100 smoothLength: None center_read: False defaultFragmentLength: read length chrsToSkip: [] region: None maxPairedFragmentLength: 1000 samFlag_include: None binLength: 100 blackListFileName: None maxFragmentLength: 0 minMappingQuality: None zerosToNans: False
I do not know what the bug is. I wonder if anyone could help? Many thanks in advance.