SAMtoBAM / MUMandCo

MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (only v4) to detect Structural Variation
GNU General Public License v3.0
64 stars 14 forks source link

multiple threads? #12

Closed davidaray closed 3 years ago

davidaray commented 3 years ago

Trying this out for a project. Seems to be running fine on two mammal genomes but it's taking quite a while to run.

Can this package be run on multiple processors to speed things up?

Thanks.

SAMtoBAM commented 3 years ago

Hi there,

The run time is mainly down to the nucmer alignment process unfortunately This is an issue with trying to maintain both MUMmer4 and MUMmer3 compatibility as multi-threading is only available with MUMmer4 I quickly generated a version for only MUMmer4 with a thread number option. I tested it on my side and the threading option appears to be working. Could you give it a whirl? "mumandco_v2.4.2_MUMmer4_multithreads.sh" The only modification is to add an option -t/--threads to the end of the command; eg: bash mumandco_v2.4.2_MUMmer4_multithreads.sh -r ./yeast.tidy.fa -q ./yeast_tidy_DEL100.fa -g 12500000 -o DEL100_test -t 10

davidaray commented 3 years ago

Thank you for the help. I made an attempt on the same mammal genomes I originally tried. By the way, those ran to completion. took about 30 hours.

Here is the output from the new run. The command was the same except for the addition of the -t option. $MUMCOPATH/mumandco_v2.4.2_MUMmer4_multithreads.sh -r assemblies/rAff.fa -q assemblies/rSed.fa -g 2500000000 -o rAffvrSed -t 36

Nucmer alignment of genomes, filtering and converting to coordinates

My what a large genome you have, this may take some time Unknown option: threads

USAGE: nucmer [options]

Try '/home/daray/conda/bin/nucmer -h' for more information. ERROR: Could not parse delta file, rAffvrSed_ref.delta error no: 400 ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter error no: 402 ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter error no: 402 Unknown option: threads

USAGE: nucmer [options]

Try '/home/daray/conda/bin/nucmer -h' for more information. ERROR: Could not parse delta file, rAffvrSed_query.delta error no: 400 ERROR: Could not parse delta file, rAffvrSed_query.delta_filter error no: 402 ERROR: Could not parse delta file, rAffvrSed_query.delta_filter error no: 402

Could this be a mummer version issue? I notice that this is for MUMmer4 and the version that ran successfully was v3.23.

In anticipation of this problem, I installed MUMmer4 using conda, 'conda install mummer4' .

$ conda list

packages in environment at /home/daray/conda/envs/mumco4:

#

Name Version Build Channel

_libgcc_mutex 0.1 main libgcc-ng 9.1.0 hdf63c60_0 libstdcxx-ng 9.1.0 hdf63c60_0 mummer4 4.0.0rc1 pl526he1b5a44_0 bioconda perl 5.26.2 h14c3975_0

It looks to me like mummer4 is installed properly.

Any ideas?

Thank you again.

--EDIT--

I just took a look at nucmer and, indeed, there is no -t option when looking at the help file.

Not sure if this makes a difference but.... $ nucmer -V nucmer NUCmer (NUCleotide MUMmer) version 3.1

SAMtoBAM commented 3 years ago

Well its nice to know it did finish within a 'reasonable' amount of time!

It seems based on your version output that it is still running MUMmer3

In the MUM&Co script the way it finds the nucmer aligner is through 'which nucmer' (and the other MUMmer tools respectively on lines 9-11) NUCMER=$(which nucmer) DELTAFILTER=$(which delta-filter) SHOWCOORDS=$(which show-coords) Therefore it goes after the nucmer command in your path You can edit this line directly to direct it to a newer installation of MUMmer4, eg: NUCMER=$(/home/user/mummer4/nucmer)

Or alternatively, build and install MUMmer4 as from the github (with 'sudo make install' at the end) Perhaps you have already tried that and it still didn't change the path?

davidaray commented 3 years ago

That seems to be doing the trick. It's running. I'll see if it finishes successfully. Regardless, thanks for being so responsive.

davidaray commented 3 years ago

Yep. That works and is a vast improvement. I ran a test using the same two genomes.

V1 used 36 processors. V2 used only one.

V1 finished in ~20 minutes. V2 is still running after 35 minutes. I expect it will take 20-30 hours based on the earlier run.

So, you might want to put this in the final version. It works.

Thanks for the help.

SAMtoBAM commented 3 years ago

Thanks for giving me the results. It makes a hell of a difference

Looks like i'll be adding this option permanently soon and mentioning that MUMmer4 will then be a requirement cheers

davidaray commented 3 years ago

Reopening this thread if that's ok.

I've come back to this after a while and have found there's a new version based on what we discussed above.

Unfortunately, this new version isn't working for me. I'm getting some very strange errors that I have no idea how to deal with.

These new errors seem to stem from this issue that comes up four times during a run of the test data.

g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory

I've done some troubleshooting and I think I've narrowed it down to these lines:

$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coords

$NUCMER --threads ${threads} --maxmatch --nosimplify -p ""$prefix"_query" $query_assembly $reference_assembly
$DELTAFILTER -m ""$prefix"_query".delta > ""$prefix"_query".delta_filter
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coords

There are four $SHOWCOORDS commands.

Any idea what might be going on here?