Closed waltergallegog closed 1 year ago
Thank you very much for the interest in nanomosv.
Is the increase in runtime expected? Is there any way to mitigate it ?
From v0.5, we loosened the threshold of the size of SV from 100 to 50. This will increase the number of candidate SVs for investigation. You may explicitly set --min_indel_size
to 100. Or I recommend adding control panel (which will remove the common SVs observed in ppulation beforehand).
Is the threads option safe? I I'm currently rerunning with 28 threads using the option --threads 28, but it does not seem to be much improvement. The total CPU utilization of the process is around 140% according to htop.
Could you try --processes
? Currently, we do not recommend to use --threads
.
Have you noticed any similar disk space issues with version v0.5? do you think the disk space problems are related to the number of files (inodes), or the disk size available?
The problem is the partition of /tmp
directory. You could explicitly set the TMPDIR to some directory of your home directory.
export TMPDIR={the appropriate directory of your home disk}
I would like to run v0.5 with my entire WGS data, as the new version is detecting more variants, but the run time is prohibitively long.
I would appreciate it if you could try the items I pointed out above and let me know if the problem persists.
Thank you for the quick feedback.
--process
option with the small dataset of 1 chromosome and now the cpu utilization is as expected.
The only problem is that two of the variants that were detected with 1 process are not detected when using the multi process option (28 in this case). Attached the CSV files with the variants. The missing ones are
1 143254717 d_110 A <DEL> . Too_low_VAF
1 143264704 i_388 C <INS> . Too_low_VAF
tumor.nanomonsv.result_human_28process.csv tumor.nanomonsv.result_human_1process.csv
The command I used is:
nanomonsv get tumor tumor.bam ref.fa --control_prefix ctrl --control_bam ctrl.bam --use_racon --processes 28
Let me know if you can use more information (like intermediate files, or rerun with some debug option enabled etc)
Thanks for your advise on the min length parameter and the use of a control panel. I will check the values for my use case.
BR.
Thanks.
I will also take a look at some of the data here to see if the results are different depending on the --process
settings.
Just in case, the missing variants had the flag of Too_low_VAF
and I guess you can safely remove them.
Hello, In the past I was able to run succesfully nanomonsv version 0.4 with my data (WGS of around 70GB for control and tumor each).
I updated to version 0.5 and now I have 3 problems:
To make some comparisons, I have run nanomonsv v0.4 and v0.5 again, with a smaller dataset containing only data from chromosome 1. This are the results:
Is the increase in runtime expected? Is there any way to mitigate it ?
Is the
threads
option safe? I I'm currently rerunning with 28 threads using the option--threads 28
, but it does not seem to be much improvement. The total CPU utilization of the process is around 140% according to htop.Have you noticed any similar disk space issues with version v0.5? do you think the disk space problems are related to the number of files (inodes), or the disk size available?
I would like to run v0.5 with my entire WGS data, as the new version is detecting more variants, but the run time is prohibitively long.
Thanks for your feedback.