Closed af8 closed 3 years ago
pavfinder should be spawning nproc
processes to expedite the support_reads finding step, this is probably evident in the error message of your previous issue (#13) where it indicates multiple batches were created for the find_support step. Without multi-processing, this step is very slow and pavfinder would take very long to finish. Are you sure only one process was used from start to finish? Does your final output have non-empty spanning_reads
column (which indicates that this step was run)?
Yes I do have some support information in the spanning_reads
column. Example with a reduced display :
chrom1 | end1 | chrom2 | end2 | event | gene1 | gene2 | in_frame | spanning_reads | flanking_pairs |
---|---|---|---|---|---|---|---|---|---|
chr18 | 26032399 | chrX | 52700578 | fusion | SS18 | SSX2 | True | 84 | 6 |
For this example on 140M pairs, pavfinder ran in 10h10m42s with 20 cores supplied, using overall 114.3% cpus and 190GB of memory.
So I suspect it makes usage of the 20 cores on short periods of time but for the most part use only one core for the main python wrapper script ?
From your experience, what is the mean runtime of pavfinder step ?
For the most part PAVFinder ran as a single process, but during the last stage of gathering support, it's spawning multiple processes to iterate through the r2c bam file. It's crucial that it can perform this step in parallel, otherwise it won't complete in even the 10hours time.
will close the issue for now unless there is more feedback
Not really. Just for completeness I have cpu stats and runtime on 30 RNAseq containing between 120 and 170 million pairs of reads + 1 outlier of 400Mp.
But I have not carefully checked that PAVFinder is fully using the 20 threads on some periods of time.
Hi @readmanchiu
I usually run FusionBloom with 20 cores. While I noticed that c2t, c2g and r2c steps make (almost) full usage of the given cores pavfinder only uses one cpu even though
--nproc 20
is given.Is it really useful or can I reduce the number of cores to 2 without penalizing the runtime for this step ?
Thanks Anthony