PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

pbsv multisample poor perfomance ? #636

Closed leone93 closed 5 months ago

leone93 commented 7 months ago

Hi all, I'm running pbsv in multisample mode providing a list of *.svsig.gz files of all my 180 samples trough a list. I'm using this command pbsv call \ --hifi -m 50 --max-ins-length 100K -j "$thr" \ "$REF" \ pbsv_pbmm2_nippo_multisample.fofn \ multisample.nippo.pbmm2.pbsv.vcf The thing is that even if the job seems to start without error and using all the cores I'm giving to software (50 in this case) creating a VCF with all the sample names inside, after a couple of hours it will start to use only one core and literally get stuck for at least one month on that part (whatever it is; I have no message of error from the software). I do not even ever see an output because at that point my administrator has to reboot the workstation; and I tried on different workstation. So I cannot understand if the program itself is very poorly optimized or if it has some problem (that he "doesn't want to tell me"). But I guess is more linked to the first part. I'm trying the latest version of the software, 2.9.0. If you have any suggestions, please let me know. Thanks.

armintoepfer commented 5 months ago

We are aware that multi-sample is not optimal. We are working on a new tool, pinging @ctsa for visibility