Open dandanWang2019 opened 3 months ago
Hi Dandan!
First, I want to preface that I'm neither an author or collaborator of Cactus. I just use it a lot for my own research and wanted to help.
That said, Dent Earl et al. has a program called mafTools that allows users to work more directly with the MAF file. In particular, the mafToFastaStitcher
command will allow you to convert your MAF to a FASTA alignment, which can be converted to a PHYLIP (if desired). Most tree-building software can handle alignments in FASTA format, so you might not need to convert to PHYLIP.
Also, I'd make sure to read through the rest of components that mafTools has to offered.
Hope this helps!
Hi Emmarie,
Thanks! It works! I will close this.
I reopened this because the sequence length of different species (converted through mafToFastaStitcher
) is not equal. The reference genome is a little bit shorter compare to others. This FASTA can always be stated error in the tree-building software.
Hope there is a solution.
Hello, I am running mafToFastaStitcher command with test data: /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs input.fa --breakpointPenalty 5 --outMfa output.mfa The input.maf is:
a score=0.0 status=test.input s ref.chr1 10 10 + 100 ACGTACGTAC s seq1.chr@ 0 10 + 100 AAAAAAAAAA s seq2.chr& 10 5 + 100 -----CCCCC s seq6.chr1 10 5 + 100 -----GGGGG s seq7.chr20 0 5 + 100 AAAAA-----
a score=0.0 status=test.input s ref.chr1 20 10 + 100 GTACGTACGT s seq2.chr!! 5 5 + 100 CCCCC----- s seq3.chr0 20 5 + 100 -----GGGGG s seq6.chr1 22 5 + 100 GGGGG-----
a score=0.0 status=test.input s ref.chr1 30 10 + 100 ACGTACGTAC s seq4.chr1 0 5 - 100 GG-----GGG s seq5.chr2 0 10 + 100 CCCCCCCCCC The input.fa is :
ref.chr1 ggggggggggACGTACGTACGTACGTACGTACGTACGTACgg seq1.chr@ AAAAAAAAAAgg seq2.chr& aaaaaaaaaaCCCCCaa seq2.chr!! aaaaaCCCCCaa seq3.chr0 aaaaaaaaaaaaaaaaaaaGGGGGaa seq4.chr1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaCCCCC seq6.chr1 aaaaaaaaaGGGGGaaaaaaaGGGGGaa seq7.chr20 AAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAATT However, I got the error: [1] 3382482 abort (core dumped) /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf
I try split the input.fa into seq1.fa, seq2.fa, seq3.fa, seq4.fa, seq6.fa, seq7.fa and running the command:/data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs ref.fa,seq1.fa,seq2.fa,seq3.fa,seq4.fa,seq6.fa,seq7.fa --breakpointPenalty 5 --outMfa output.mfa, then I got the same error:abort (core dumped)
The compilation for mafToFastaStitcher is correct with make test:
gcc -std=c99 -Wno-unused-but-set-variable -c src/mafToFastaStitcherAPI.c -o test/mafToFastaStitcherAPI.o.tmp -O3 -Wall -Werror --pedantic -funro$
l-loops -DNDEBUG -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -I ../../sonLib/lib -I ../inc -I ../external -lm
mv test/mafToFastaStitcherAPI.o.tmp test/mafToFastaStitcherAPI.o
mkdir -p test/
gcc -std=c99 -Wno-unused-but-set-variable -c src/buildVersion.c -o test/buildVersion.o.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -$
../inc -I ../external
mv test/buildVersion.o.tmp test/buildVersion.o
mkdir -p test/
gcc -std=c99 -Wno-unused-but-set-variable -c src/test.mafToFastaStitcherAPI.c -o test/test.mafToFastaStitcherAPI.o.tmp -O0 -g -Wall -Werror --pe$
antic -I ../../sonLib/lib -I ../inc -I ../external
mv test/test.mafToFastaStitcherAPI.o.tmp test/test.mafToFastaStitcherAPI.o
mkdir -p test/
gcc -std=c99 -Wno-unused-but-set-variable src/allTests.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI.o ../../$
onLib/lib/sonLib.a test/buildVersion.o test/test.mafToFastaStitcherAPI.o -o test/allTests.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/li$
-I ../inc -I ../external -lm
mv test/allTests.tmp test/allTests
mkdir -p test/
gcc -std=c99 -Wno-unused-but-set-variable src/mafToFastaStitcher.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI
.o ../../sonLib/lib/sonLib.a test/buildVersion.o -o test/mafToFastaStitcher.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -I ../inc -I
../external -lm
mv test/mafToFastaStitcher.tmp test/mafToFastaStitcher
./test/allTests && python2.7 src/test.mafToFastaStitcher.py --verbose && rm -rf ./test/ && rmdir ./tempTestDir
Running test case test_readingFasta_0
Running test case test_newBlockHashFromBlock_0
Running test case test_addMafLineToRow_0
Running test case test_addMafLineToRow_1
Running test case test_penalize_0
Running test case test_interstitial_0
Running test case test_addBlockToHash_0
Running test case test_addBlockToHash_1
Running test case test_addBlockToHash_2
Running test case test_addBlockToHash_3
Running test case test_addBlockToHash_4
Running test case test_addBlockToHash_5
Running test case test_addBlockToHash_6
.............
OK (13 tests)
testAllTests (main.CuTest) If valgrind is installed on the system, check for memory related errors in CuTests ... ok testFastaStitch (main.FastaStitchTest) mafToFastaStitcher should produce known output for a given known input ... ok testMemory1 (main.FastaStitchTest) If valgrind is installed on the system, check for memory related errors (1). ... ok
Ran 3 tests in 19.287s
OK Could you give me any suggestions? Looking forward with your reply. Best wishes Na Wan
I reopened this because the sequence length of different species (converted through
mafToFastaStitcher
) is not equal. The reference genome is a little bit shorter compare to others. This FASTA can always be stated error in the tree-building software.Hope there is a solution.
Hi Dandan (@dandanWang2019),
It's hard to say what the problem is. I had a similar issue once, and here's what one of the authors had to say. Based on that, you can try using --gapFill 0
when converting from HAL to MAF. It's possible that there's additional gaps being inserted into your reference sequence when converting?
Also, when you converted your HAL to MAF, what --dupeMode
parameter did you set? Is it possible that some of the other sequences may have more duplications written into their FASTA sequence compared to the reference? You can look at your original MAF alignment and see if there are multiple alignment lines for the same species within a block. If so, then I'd use --mafDuplicateFilter
from mafTools to filter those (unless you want to preserve them).
Hello, I am running mafToFastaStitcher command with test data: /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs input.fa --breakpointPenalty 5 --outMfa output.mfa The input.maf is:
maf version=1
a score=0.0 status=test.input s ref.chr1 10 10 + 100 ACGTACGTAC s seq1.chr@ 0 10 + 100 AAAAAAAAAA s seq2.chr& 10 5 + 100 -----CCCCC s seq6.chr1 10 5 + 100 -----GGGGG s seq7.chr20 0 5 + 100 AAAAA-----
a score=0.0 status=test.input s ref.chr1 20 10 + 100 GTACGTACGT s seq2.chr!! 5 5 + 100 CCCCC----- s seq3.chr0 20 5 + 100 -----GGGGG s seq6.chr1 22 5 + 100 GGGGG-----
a score=0.0 status=test.input s ref.chr1 30 10 + 100 ACGTACGTAC s seq4.chr1 0 5 - 100 GG-----GGG s seq5.chr2 0 10 + 100 CCCCCCCCCC The input.fa is :
ref.chr1 ggggggggggACGTACGTACGTACGTACGTACGTACGTACgg seq1.chr@ AAAAAAAAAAgg seq2.chr& aaaaaaaaaaCCCCCaa seq2.chr!! aaaaaCCCCCaa seq3.chr0 aaaaaaaaaaaaaaaaaaaGGGGGaa seq4.chr1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaCCCCC seq6.chr1 aaaaaaaaaGGGGGaaaaaaaGGGGGaa seq7.chr20 AAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAATT However, I got the error: [1] 3382482 abort (core dumped) /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf
I try split the input.fa into seq1.fa, seq2.fa, seq3.fa, seq4.fa, seq6.fa, seq7.fa and running the command:/data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs ref.fa,seq1.fa,seq2.fa,seq3.fa,seq4.fa,seq6.fa,seq7.fa --breakpointPenalty 5 --outMfa output.mfa, then I got the same error:abort (core dumped)
The compilation for mafToFastaStitcher is correct with make test:
gcc -std=c99 -Wno-unused-but-set-variable -c src/mafToFastaStitcherAPI.c -o test/mafToFastaStitcherAPI.o.tmp -O3 -Wall -Werror --pedantic -funro$ l-loops -DNDEBUG -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -I ../../sonLib/lib -I ../inc -I ../external -lm mv test/mafToFastaStitcherAPI.o.tmp test/mafToFastaStitcherAPI.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable -c src/buildVersion.c -o test/buildVersion.o.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -$ ../inc -I ../external mv test/buildVersion.o.tmp test/buildVersion.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable -c src/test.mafToFastaStitcherAPI.c -o test/test.mafToFastaStitcherAPI.o.tmp -O0 -g -Wall -Werror --pe$ antic -I ../../sonLib/lib -I ../inc -I ../external mv test/test.mafToFastaStitcherAPI.o.tmp test/test.mafToFastaStitcherAPI.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable src/allTests.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI.o ../../$ onLib/lib/sonLib.a test/buildVersion.o test/test.mafToFastaStitcherAPI.o -o test/allTests.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/li$ -I ../inc -I ../external -lm mv test/allTests.tmp test/allTests mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable src/mafToFastaStitcher.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI .o ../../sonLib/lib/sonLib.a test/buildVersion.o -o test/mafToFastaStitcher.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -I ../inc -I ../external -lm mv test/mafToFastaStitcher.tmp test/mafToFastaStitcher ./test/allTests && python2.7 src/test.mafToFastaStitcher.py --verbose && rm -rf ./test/ && rmdir ./tempTestDir Running test case test_readingFasta_0 Running test case test_newBlockHashFromBlock_0 Running test case test_addMafLineToRow_0 Running test case test_addMafLineToRow_1 Running test case test_penalize_0 Running test case test_interstitial_0 Running test case test_addBlockToHash_0 Running test case test_addBlockToHash_1 Running test case test_addBlockToHash_2 Running test case test_addBlockToHash_3 Running test case test_addBlockToHash_4 Running test case test_addBlockToHash_5 Running test case test_addBlockToHash_6 .............
OK (13 tests)
testAllTests (main.CuTest) If valgrind is installed on the system, check for memory related errors in CuTests ... ok testFastaStitch (main.FastaStitchTest) mafToFastaStitcher should produce known output for a given known input ... ok testMemory1 (main.FastaStitchTest) If valgrind is installed on the system, check for memory related errors (1). ... ok
Ran 3 tests in 19.287s
OK Could you give me any suggestions? Looking forward with your reply. Best wishes Na Wan
Hi Na (@aaannaw),
As I mentioned in my earliest reply, I just want to let you know that I'm not affiliated with either Cactus or mafTools - I'm just a user.
I'm not entirely sure what the problem is, but I suspect that it is related to you providing multiple sequence FASTAs in the second command. Based on the MAF block and input.fa you shared, it seems that the input.fa already contains all of the sequences in the MAF block?
I'd also consider making an issue on mafTools if the issue persists.
Hello, I am running mafToFastaStitcher command with test data: /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs input.fa --breakpointPenalty 5 --outMfa output.mfa The input.maf is:
maf version=1
a score=0.0 status=test.input s ref.chr1 10 10 + 100 ACGTACGTAC s seq1.chr@ 0 10 + 100 AAAAAAAAAA s seq2.chr& 10 5 + 100 -----CCCCC s seq6.chr1 10 5 + 100 -----GGGGG s seq7.chr20 0 5 + 100 AAAAA----- a score=0.0 status=test.input s ref.chr1 20 10 + 100 GTACGTACGT s seq2.chr!! 5 5 + 100 CCCCC----- s seq3.chr0 20 5 + 100 -----GGGGG s seq6.chr1 22 5 + 100 GGGGG----- a score=0.0 status=test.input s ref.chr1 30 10 + 100 ACGTACGTAC s seq4.chr1 0 5 - 100 GG-----GGG s seq5.chr2 0 10 + 100 CCCCCCCCCC The input.fa is :
ref.chr1 ggggggggggACGTACGTACGTACGTACGTACGTACGTACgg seq1.chr@ AAAAAAAAAAgg seq2.chr& aaaaaaaaaaCCCCCaa seq2.chr!! aaaaaCCCCCaa seq3.chr0 aaaaaaaaaaaaaaaaaaaGGGGGaa seq4.chr1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaCCCCC seq6.chr1 aaaaaaaaaGGGGGaaaaaaaGGGGGaa seq7.chr20 AAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAATT However, I got the error: [1] 3382482 abort (core dumped) /data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf
I try split the input.fa into seq1.fa, seq2.fa, seq3.fa, seq4.fa, seq6.fa, seq7.fa and running the command:/data/01/p1/user157/software/mafTools/bin/mafToFastaStitcher -m input.maf --seqs ref.fa,seq1.fa,seq2.fa,seq3.fa,seq4.fa,seq6.fa,seq7.fa --breakpointPenalty 5 --outMfa output.mfa, then I got the same error:abort (core dumped) The compilation for mafToFastaStitcher is correct with make test: gcc -std=c99 -Wno-unused-but-set-variable -c src/mafToFastaStitcherAPI.c -o test/mafToFastaStitcherAPI.o.tmp -O3 -Wall -Werror --pedantic -funro$ l-loops -DNDEBUG -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -I ../../sonLib/lib -I ../inc -I ../external -lm mv test/mafToFastaStitcherAPI.o.tmp test/mafToFastaStitcherAPI.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable -c src/buildVersion.c -o test/buildVersion.o.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -$ ../inc -I ../external mv test/buildVersion.o.tmp test/buildVersion.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable -c src/test.mafToFastaStitcherAPI.c -o test/test.mafToFastaStitcherAPI.o.tmp -O0 -g -Wall -Werror --pe$ antic -I ../../sonLib/lib -I ../inc -I ../external mv test/test.mafToFastaStitcherAPI.o.tmp test/test.mafToFastaStitcherAPI.o mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable src/allTests.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI.o ../../$ onLib/lib/sonLib.a test/buildVersion.o test/test.mafToFastaStitcherAPI.o -o test/allTests.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/li$ -I ../inc -I ../external -lm mv test/allTests.tmp test/allTests mkdir -p test/ gcc -std=c99 -Wno-unused-but-set-variable src/mafToFastaStitcher.c test/sharedMaf.o test/common.o ../external/CuTest.a test/mafToFastaStitcherAPI .o ../../sonLib/lib/sonLib.a test/buildVersion.o -o test/mafToFastaStitcher.tmp -O0 -g -Wall -Werror --pedantic -I ../../sonLib/lib -I ../inc -I ../external -lm mv test/mafToFastaStitcher.tmp test/mafToFastaStitcher ./test/allTests && python2.7 src/test.mafToFastaStitcher.py --verbose && rm -rf ./test/ && rmdir ./tempTestDir Running test case test_readingFasta_0 Running test case test_newBlockHashFromBlock_0 Running test case test_addMafLineToRow_0 Running test case test_addMafLineToRow_1 Running test case test_penalize_0 Running test case test_interstitial_0 Running test case test_addBlockToHash_0 Running test case test_addBlockToHash_1 Running test case test_addBlockToHash_2 Running test case test_addBlockToHash_3 Running test case test_addBlockToHash_4 Running test case test_addBlockToHash_5 Running test case test_addBlockToHash_6 ............. OK (13 tests) testAllTests (main.CuTest) If valgrind is installed on the system, check for memory related errors in CuTests ... ok testFastaStitch (main.FastaStitchTest) mafToFastaStitcher should produce known output for a given known input ... ok testMemory1 (main.FastaStitchTest) If valgrind is installed on the system, check for memory related errors (1). ... ok Ran 3 tests in 19.287s OK Could you give me any suggestions? Looking forward with your reply. Best wishes Na Wan
Hi Na (@aaannaw),
As I mentioned in my earliest reply, I just want to let you know that I'm not affiliated with either Cactus or mafTools - I'm just a user.
I'm not entirely sure what the problem is, but I suspect that it is related to you providing multiple sequence FASTAs in the second command. Based on the MAF block and input.fa you shared, it seems that the input.fa already contains all of the sequences in the MAF block?
I'd also consider making an issue on mafTools if the issue persists.
Hello,emistasis The showed MAF and input.fa are both from the test data (https://github.com/dentearl/mafTools/tree/master/mafToFastaStitcher). However, I failed to work. I have required help for the author of mafToFastaStitcher, but no reply. Maybe other tools could convert maf to fasta but I have no idea. Best wishes! Na Wan
This seems like a recurring issue. There was a more recent issue with similar problem. You could give SEGUL a try. We don't have support for FASTA reference yet. But, it can get the name from a BED file. The feature is in beta now. It will need a compiling, but should work regardless. Feel free to report issues in SEGUL repo.
Hi,
The alignment is very fast in my case, which is based on one chromosome from different species. Thanks for the great work. I am going to build a phylogenetic tree with the alignments. So I used "cactus-hal2maf" to convert hal to maf format and then to PHYLIP format. The alignments seem output only in blocks.
Does anyone knows how to get a whole alignment sequence for each species rather than sequence within blocks or how to solve this to build tree?