Open snmrna opened 6 years ago
Hi,
GraftM isn't regularly tested on OSX, so there is a possibility it is that.
But, the problem does seem to be fasttree specific. Would you mind running something like this to test please?
ps mafft input_proteins.faa >aligned.faa fasttree -log fasttree.log -out fasttree.tree aligned.faa echo $?
Thanks, ben
Hi, wwood, Thanks very much for your quick reply! I have tested according to your suggestions and got the following feedback: weibintekiMacBook-Air:GraftM weibin$ mafft GUS.fasta >aligned.fasta
nthread = 0 nthreadpair = 0 nthreadtb = 0 stacksize: 8192 kb Gap Penalty = -1.53, +0.00, +0.00
Making a distance matrix ..
There are 170 ambiguous characters. 901 / 923 done.
Constructing a UPGMA tree (efffree=0) ... 920 / 923 done.
Progressive alignment 1/2... STEP 801 / 922 f Reallocating..done. alloclen = 6277 STEP 901 / 922 h Reallocating..done. alloclen = 7428
done.
Making a distance matrix from msa.. 900 / 923 done.
Constructing a UPGMA tree (efffree=1) ... 920 / 923 done.
Progressive alignment 2/2... STEP 901 / 922 h Reallocating..done. *alloclen = 5961
Reallocating..done. *alloclen = 8199
done.
disttbfast (aa) Version 7.394 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0 0 thread(s)
Strategy: FFT-NS-2 (Fast but rough) Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.
weibintekiMacBook-Air:GraftM weibin$ fasttree -log fasttree.log -out fasttree.tree aligned.fasta FastTree Version 2.1.10 SSE3 Alignment: aligned.fasta Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000 Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.80 ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories Ignored unknown character X (seen 170 times) Segmentation fault: 11 weibintekiMacBook-Air:GraftM weibin$ echo $? 139
Do you know how to fix it ? Sorry for my less experience. @wwood I succeed in creating a gpkg when I replaced the amino acid sequences in the fasta file with gene sequences.
Thanks for running that. It indeed points to an issue with fasttree, rather than the GraftM code itself not working on OSX for some reason.
I'm not sure why, but fasttree is crashing on your tree, here:
Ignored unknown character X (seen 170 times) Segmentation fault: 11
I've not seen this issue before. You have the newest version of FastTree running, so updating to fix isn't going to work.
I suspect then there is either something wrong with the way fasttree was compiled, or something particular to your sequences e.g. a sequence that is made up exclusively of X characters or perhaps something more subtle. Perhaps try making the tree on linux, or removing sequences from the alignment until it no longer segfaults.
Good luck. ben
Thanks for your suggestions! It is the problem of fasttree. I fixed this bug after I compiled it with another command : gcc -DNO_SSE -O3 -finline-functions -funroll-loops -Wall -o FastTree FastTree.c -lm.
But I still have another question. I have successfully installed graftm on a windows netobook (checked in python) and add the path to environment variables, but failed to run graftm in cmd? Do you know how to set my netobook to run graftm?
Sorry for bother you again. @wwood
When I created my own gpkg on a mac (input file: ~900 amino acid sequences), it seems that I got the gpkg for my proteins, but still report an error at the step "Testing gpkg package works", do you know what is wrong and how to fix this problem?
Here is the report:
04/16/2018 04:48:39 PM INFO: Building gpkg for GUS923.gpkg
04/16/2018 04:48:39 PM INFO: Building seqinfo and taxonomy file from input taxonomy
04/16/2018 04:48:39 PM INFO: Checking for duplicate sequences
04/16/2018 04:48:39 PM INFO: Aligning sequences to create aligned FASTA file
04/16/2018 04:48:55 PM INFO: Building HMM from alignment
04/16/2018 04:49:02 PM INFO: Filtered 0 short sequences from the alignment
04/16/2018 04:49:02 PM INFO: 923 sequences remaining
04/16/2018 04:49:02 PM INFO: Checking for incorrect or fragmented reads
04/16/2018 04:49:17 PM INFO: Building HMM from alignment
04/16/2018 04:49:24 PM INFO: Filtered 0 short sequences from the alignment
04/16/2018 04:49:24 PM INFO: 923 sequences remaining
04/16/2018 04:49:24 PM INFO: Deduplicating sequences
04/16/2018 04:49:24 PM INFO: Removed 34 sequences as duplicates, leaving 889 non-identical sequences
04/16/2018 04:49:24 PM INFO: Building tree
04/16/2018 04:51:32 PM INFO: Building seqinfo and taxonomy file from input taxonomy
04/16/2018 04:51:32 PM INFO: Creating reference package
04/16/2018 04:51:32 PM INFO: Attempting to run taxit create with rerooting capabilities
04/16/2018 04:51:34 PM INFO: Creating diamond database
04/16/2018 04:51:34 PM INFO: Compiling gpkg
04/16/2018 04:51:34 PM INFO: Cleaning up
04/16/2018 04:51:34 PM INFO: Testing gpkg package works
Traceback (most recent call last):
File "/usr/local/bin/graftM", line 4, in
Hi, That seems like it might be a proper bug with GraftM. Are you able to send me the sequences you are trying to work with and the taxonomy file you used please? Just to my email which you can see at http://ecogenomic.org/personnel/dr-ben-woodcroft Thanks, ben
Hi, I ran graftM create with intention to create a pkgs for a list of protein sequences and got the following error: 04/15/2018 03:46:24 PM INFO: Building gpkg for GUS923.gpkg 04/15/2018 03:46:24 PM INFO: Building seqinfo and taxonomy file from input taxonomy 04/15/2018 03:46:24 PM INFO: Checking for duplicate sequences 04/15/2018 03:46:24 PM INFO: Aligning sequences to create aligned FASTA file 04/15/2018 03:46:48 PM INFO: Building HMM from alignment 04/15/2018 03:46:56 PM INFO: Filtered 0 short sequences from the alignment 04/15/2018 03:46:56 PM INFO: 923 sequences remaining 04/15/2018 03:46:56 PM INFO: Checking for incorrect or fragmented reads 04/15/2018 03:47:23 PM INFO: Building HMM from alignment 04/15/2018 03:47:32 PM INFO: Filtered 0 short sequences from the alignment 04/15/2018 03:47:32 PM INFO: 923 sequences remaining 04/15/2018 03:47:33 PM INFO: Deduplicating sequences 04/15/2018 03:47:33 PM INFO: Removed 47 sequences as duplicates, leaving 876 non-identical sequences 04/15/2018 03:47:33 PM INFO: Building tree Traceback (most recent call last): File "/usr/local/bin/graftM", line 4, in
import('pkg_resources').run_script('graftm==0.11.1', 'graftM')
File "/Users/weibin/Library/Python/2.7/lib/python/site-packages/pkg_resources/init.py", line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/weibin/Library/Python/2.7/lib/python/site-packages/pkg_resources/init.py", line 1527, in run_script
exec(code, namespace, namespace)
File "/Library/Python/2.7/site-packages/graftm-0.11.1-py2.7.egg/EGG-INFO/scripts/graftM", line 410, in
Run(args).main()
File "/Library/Python/2.7/site-packages/graftm-0.11.1-py2.7.egg/graftm/run.py", line 657, in main
threads = self.args.threads
File "/Library/Python/2.7/site-packages/graftm-0.11.1-py2.7.egg/graftm/create.py", line 730, in main
self.fasttree)
File "/Library/Python/2.7/site-packages/graftm-0.11.1-py2.7.egg/graftm/create.py", line 220, in _build_tree
extern.run(cmd)
File "build/bdist.macosx-10.13-intel/egg/extern/init.py", line 46, in run
extern.ExternCalledProcessError: Command fasttree -quiet -log /var/folders/yn/9h2_d_7556970rsv05lmd6p80000gn/T/tmpP9qa3Y/GUS923.tre.log -out /var/folders/yn/9h2_d_7556970rsv05lmd6p80000gn/T/tmpP9qa3Y/GUS923.tre /var/folders/yn/9h2_d_7556970rsv05lmd6p80000gn/T/tmpP9qa3Y/GUS923_deduplicated_aligned.fasta returned non-zero exit status -11.
STDERR was: Ignored unknown character X (seen 12 times)
STDOUT was:
Anyone know the reason and how to fix it ? Thanks in advance.