friend1ws / nanomonsv

SV detection tool for nanopore sequence reads
GNU General Public License v3.0
88 stars 12 forks source link

Run time and disk space required in version 0.5 #30

Closed waltergallegog closed 1 year ago

waltergallegog commented 1 year ago

Hello, In the past I was able to run succesfully nanomonsv version 0.4 with my data (WGS of around 70GB for control and tumor each).

I updated to version 0.5 and now I have 3 problems:

To make some comparisons, I have run nanomonsv v0.4 and v0.5 again, with a smaller dataset containing only data from chromosome 1. This are the results:

Metric V0.4 V0.5
Time for parsing tumor 3 minutes 3 minutes
Time for parsing control 3 minutes 3 minutes
Time for get 12 minutes 92 minutes
Number of variants detected 15 30

Thanks for your feedback.

friend1ws commented 1 year ago

Thank you very much for the interest in nanomosv.

Is the increase in runtime expected? Is there any way to mitigate it ?

From v0.5, we loosened the threshold of the size of SV from 100 to 50. This will increase the number of candidate SVs for investigation. You may explicitly set --min_indel_size to 100. Or I recommend adding control panel (which will remove the common SVs observed in ppulation beforehand).

Is the threads option safe? I I'm currently rerunning with 28 threads using the option --threads 28, but it does not seem to be much improvement. The total CPU utilization of the process is around 140% according to htop.

Could you try --processes? Currently, we do not recommend to use --threads.

Have you noticed any similar disk space issues with version v0.5? do you think the disk space problems are related to the number of files (inodes), or the disk size available?

The problem is the partition of /tmp directory. You could explicitly set the TMPDIR to some directory of your home directory.

export TMPDIR={the appropriate directory of your home disk}

I would like to run v0.5 with my entire WGS data, as the new version is detecting more variants, but the run time is prohibitively long.

I would appreciate it if you could try the items I pointed out above and let me know if the problem persists.

waltergallegog commented 1 year ago

Thank you for the quick feedback.

tumor.nanomonsv.result_human_28process.csv tumor.nanomonsv.result_human_1process.csv

The command I used is:

nanomonsv get tumor tumor.bam ref.fa --control_prefix ctrl --control_bam ctrl.bam --use_racon --processes 28

Let me know if you can use more information (like intermediate files, or rerun with some debug option enabled etc)

BR.

friend1ws commented 1 year ago

Thanks. I will also take a look at some of the data here to see if the results are different depending on the --process settings. Just in case, the missing variants had the flag of Too_low_VAF and I guess you can safely remove them.