Closed davidaray closed 3 years ago
Hi there,
The run time is mainly down to the nucmer alignment process unfortunately
This is an issue with trying to maintain both MUMmer4 and MUMmer3 compatibility as multi-threading is only available with MUMmer4
I quickly generated a version for only MUMmer4 with a thread number option. I tested it on my side and the threading option appears to be working. Could you give it a whirl? "mumandco_v2.4.2_MUMmer4_multithreads.sh"
The only modification is to add an option -t/--threads to the end of the command;
eg:
bash mumandco_v2.4.2_MUMmer4_multithreads.sh -r ./yeast.tidy.fa -q ./yeast_tidy_DEL100.fa -g 12500000 -o DEL100_test -t 10
Thank you for the help. I made an attempt on the same mammal genomes I originally tried. By the way, those ran to completion. took about 30 hours.
Here is the output from the new run. The command was the same except for the addition of the -t option. $MUMCOPATH/mumandco_v2.4.2_MUMmer4_multithreads.sh -r assemblies/rAff.fa -q assemblies/rSed.fa -g 2500000000 -o rAffvrSed -t 36
Nucmer alignment of genomes, filtering and converting to coordinates
My what a large genome you have, this may take some time Unknown option: threads
USAGE: nucmer [options]
Try '/home/daray/conda/bin/nucmer -h' for more information. ERROR: Could not parse delta file, rAffvrSed_ref.delta error no: 400 ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter error no: 402 ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter error no: 402 Unknown option: threads
USAGE: nucmer [options]
Try '/home/daray/conda/bin/nucmer -h' for more information. ERROR: Could not parse delta file, rAffvrSed_query.delta error no: 400 ERROR: Could not parse delta file, rAffvrSed_query.delta_filter error no: 402 ERROR: Could not parse delta file, rAffvrSed_query.delta_filter error no: 402
Could this be a mummer version issue? I notice that this is for MUMmer4 and the version that ran successfully was v3.23.
In anticipation of this problem, I installed MUMmer4 using conda, 'conda install mummer4' .
$ conda list
#
_libgcc_mutex 0.1 main libgcc-ng 9.1.0 hdf63c60_0 libstdcxx-ng 9.1.0 hdf63c60_0 mummer4 4.0.0rc1 pl526he1b5a44_0 bioconda perl 5.26.2 h14c3975_0
It looks to me like mummer4 is installed properly.
Any ideas?
Thank you again.
--EDIT--
I just took a look at nucmer and, indeed, there is no -t option when looking at the help file.
Not sure if this makes a difference but.... $ nucmer -V nucmer NUCmer (NUCleotide MUMmer) version 3.1
Well its nice to know it did finish within a 'reasonable' amount of time!
It seems based on your version output that it is still running MUMmer3
In the MUM&Co script the way it finds the nucmer aligner is through 'which nucmer' (and the other MUMmer tools respectively on lines 9-11)
NUCMER=$(which nucmer) DELTAFILTER=$(which delta-filter) SHOWCOORDS=$(which show-coords)
Therefore it goes after the nucmer command in your path
You can edit this line directly to direct it to a newer installation of MUMmer4, eg:
NUCMER=$(/home/user/mummer4/nucmer)
Or alternatively, build and install MUMmer4 as from the github (with 'sudo make install' at the end) Perhaps you have already tried that and it still didn't change the path?
That seems to be doing the trick. It's running. I'll see if it finishes successfully. Regardless, thanks for being so responsive.
Yep. That works and is a vast improvement. I ran a test using the same two genomes.
V1 used 36 processors. V2 used only one.
V1 finished in ~20 minutes. V2 is still running after 35 minutes. I expect it will take 20-30 hours based on the earlier run.
So, you might want to put this in the final version. It works.
Thanks for the help.
Thanks for giving me the results. It makes a hell of a difference
Looks like i'll be adding this option permanently soon and mentioning that MUMmer4 will then be a requirement cheers
Reopening this thread if that's ok.
I've come back to this after a while and have found there's a new version based on what we discussed above.
Unfortunately, this new version isn't working for me. I'm getting some very strange errors that I have no idea how to deal with.
These new errors seem to stem from this issue that comes up four times during a run of the test data.
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
I've done some troubleshooting and I think I've narrowed it down to these lines:
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coords
$NUCMER --threads ${threads} --maxmatch --nosimplify -p ""$prefix"_query" $query_assembly $reference_assembly
$DELTAFILTER -m ""$prefix"_query".delta > ""$prefix"_query".delta_filter
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coords
There are four $SHOWCOORDS commands.
Any idea what might be going on here?
Trying this out for a project. Seems to be running fine on two mammal genomes but it's taking quite a while to run.
Can this package be run on multiple processors to speed things up?
Thanks.