biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

question about example 01 S. aureus, Step 4. Add S. aureus reference genomes #11

Closed zckoo007 closed 4 years ago

zckoo007 commented 4 years ago

when I use the following code, I get an error

../../phylophlan.py \
    -i input_references \
    -o output_references \
    -d s__Staphylococcus_aureus \
    -t a -f references_config.cfg \
    --nproc 10 \
    --subsample twentyfivepercent \
    --diversity low \
    --fast \
    2>&1 |tee logs/phylophlan__reference_genomes__s__Staphylococcus_aureus.log
"output_references/tmp/sub/A0A2C9TN08.aln" generated in 4s
Concatenating alignments
Alignments concatenated "output_references/input_references_concatenated.aln" in 2s
Building phylogeny "output_references/input_references_concatenated.aln"

[e] Command '['FastTreeMP', '-quiet', '-mlacc', '2', '-slownni', '-spr', '4', '-fastest', '-mlnni', '4', '-no2nd', '-lg', '-out', '/public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre', 'output_references/input_references_concatenated.aln']' returned non-zero exit status 1.

[e] error while executing
    command_line: FastTreeMP -quiet -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -lg -out /public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre output_references/input_references_concatenated.aln
           stdin: None
          stdout: None

Then I type following code again, I get an other error

FastTreeMP -quiet -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -lg -out /public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre output_references/input_references_concatenated.aln
Wrong number of characters for GCA_000543025: expected 1257 but have 712 instead.
This sequence may be truncated, or another sequence may be too long.

so I check the input_references_concatenated.aln file

>GCA_000543025
VAVVRQNVSAASNVAIDKVTPPFTKSHNVSPNGKNITATTRTTLLKSLSQTENRMMNSDS
ANEQANPAFLAIHFLNANIKINVIVAAKSHLISVNTNTNNVHKAILAREFEAAMQSDAAV
LIRTQPTVEIFTVATLRKNAKVAHSALDSIFLLASIVVLFISHFIMKIFTLRALNSEDAK
GTSDKIQSEETNQQKITTKDITHDQVQYHNRWNNNAAYTINQNRNFNFALKHIPTNFTIM
KRSSVHLFLDMGV------FMVNKAVSANERFQQQNDAANNGSVQFPNHQNDTTSANEQY
QQQNDAANQTRVDVANTVELMVILASYSAFKSQTKIAIIQFQIACVTQAVIYEKPWLSML
YRAYNNTAMTTYNVRVYSSQAGNFWRAYNNTSMSTYDILIYPSEVSAFTSAAFTGKHHES
EKSYCNKSRREETTNSLFSTLLGFVISHRYKRPLFSTLLRFVINFEYYESMLASFYGVIA
SHRQKEPHATSKFKQGEVWSKTIQYQYFTEDETYTAQEAAASFKSTHPNSESRAMQCYEF
EEFRNEVKSNIVAVLTFSMIEWHYRRAIERAVLQPSIIEKEFEGHSIERNLIYRKNKLYE
AMAMDNTNSDTTVQDTNVANNGLSAQASGSATSVSPQTGNTVSATTNNGGDAAYASGTDF
ANTDIAFDYETDKPVKDTYTPNDSVNENGLVTDTANTTNTVETITKAKATVA
>GCA_900041155
VVLGRQNVSAASDAAIERIKPPFTKSHNVSPNDKNIASAAKTALLKNLSQTKDRMMNSNG
ANKQANPAFLAIHFLNANTKINVIVAANTQSVSANTNTSNVHQALLIREFEAAMESDAAV
LIRTQSIVEIYAVVTFRKKTKVAHNTLNTIFLLASIVVLFISHFIMKIFTLRALNSEDAK
GTSDNIQSEETNQQKITTKDITHDQVQYHNRWNNNAAYTINQNWYDKGDKGQSFKVRENR
NFNFALKHIPTNFTIMKRSSVHLFLDMGFMANKAVSANERFQQQNDAANSANEQYQQQND
AANQTRVDESNAVQFQIACVTQAVIYEKPWLSMLELMVILASYSAFKSQTKIAIIWRAYN
NTSMSTYDILIYPSEVSAFYRAYNNTAMTTYNVRVYSSQAGNFYRAYNNTAMTTYNVRVY
SNQASNFWKSFDRHSLTVFDILIWSSEVSSYTSAAFTGKHHESEKSYCNKSRREETTNSL
FSTLLGFVISHRYKRPLFSTLLRFVINFEYYESMLASFYGVIASHRQKEPHATSKFKQGE
VWSKTIQYQYFTEDETYTAQEAAASFKSTHPNSESRAMQCYEFEEFRNEVNTSEHYRRAI
ERAVSKEFEHYAKNPSKEEHQEPTTYQTNNTYANTCSAENFYKAKYSRQTTQHVIGIMAL
TLLSKKFKQGEVWSKEGKTVCSIQYQAYTLDDQVMFIEDAIVATLKLEKSKSNIVAVLTF
SMIEWHYRRAIERAVLQPSIIEKEFEGHSIERNLIYRKNKLYEAMAMDNTQGESLGHNTN
VDTSDISSQTSVGVMPVPSSSAKSAATNTNDDRDAAYISGTDFANVDVGFDYESDKQIKD
TFSPEDSVNENRLVADMVDATNTIEALAQANNITA
...

I am a novice, I am learning all your examples, can you give me some help

fasnicar commented 4 years ago

Hi, did you run PhyloPhlAn from scratch or there were temp files from a previous run that was halted? If the latter, then I suggest you remove the output folder and re-run PhyloPhlAn, as most likely there were temp files not correctly cleaned from the previous run that are causing this issue.

Many thanks, Francesco

zckoo007 commented 4 years ago

I remove all the output folder, but it take one or two days to get a result, would you design a light and sample example?

fasnicar commented 4 years ago

To speed it up we can remove some of the input genomes, but then we'll not be sure what is causing the problem. So, in this case, I would prefer if you can run the whole example from scratch.

Thanks, Francesco

zckoo007 commented 4 years ago

Hi fasnicar, Thank you very much for your patient help,I've run through all five examples,This is really a good tool!!

fasnicar commented 4 years ago

Many thanks for testing all examples and using our tool!