flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

Pantagrel pipeline task 5: failed #19

Closed kuzman1306 closed 5 years ago

kuzman1306 commented 5 years ago

Hi Florent,

The following error occurred by using "real" data input:

reference tree name translation in complete; tree with organism names stored in file '/home/kuzman/vitis3/05.core_genome/core-genome-based_reference_tree_vitis3.full.names'
The input trees are not rooted, use either option -g to specify the outgroups file or -r to estimate the root
ERROR: failed ultrametrization of reference tree
ERROR: Pantagrel pipeline task 5: failed.

lsd is correctly installed.

As you suggested, I checked if the input files are correctly in place:

kuzman@kuzman-VirtualBox:~$ source /home/kuzman/vitis3/environ_pantagruel_vitis3.sh
kuzman@kuzman-VirtualBox:~$ ls -l ${pseudocorealn}.reduced
ls: cannot access '/home/kuzman/vitis3/05.core_genome/strict-core-unicopy_concat_cds_28-genomes_vitis3.aln.reduced': No such file or directory
kuzman@kuzman-VirtualBox:~$ head -n1 ${pseudocorealn}.reduced
head: cannot open '/home/kuzman/vitis3/05.core_genome/strict-core-unicopy_concat_cds_28-genomes_vitis3.aln.reduced' for reading: No such file or directory
kuzman@kuzman-VirtualBox:~$ ls -l ${pseudocorealn} 
-rw-r--r-- 1 kuzman kuzman 37970307 Aug 15 22:32 /home/kuzman/vitis3/05.core_genome/strict-core-unicopy_concat_cds_28-genomes_vitis3.aln
kuzman@kuzman-VirtualBox:~$ head -n1 ${pseudocorealn}
>AGRALB
kuzman@kuzman-VirtualBox:~$ ls -l ${speciestree}
-rw-r--r-- 1 kuzman kuzman 1302 Aug 16 00:40 /home/kuzman/vitis3/05.core_genome/core-genome-based_reference_tree_vitis3.full
kuzman@kuzman-VirtualBox:~$ ls -ltr ${coregenome}
total 37116
drwxr-xr-x 2 kuzman kuzman     4096 Aug 15 22:31 pseudo-coregenome_sets
-rw-r--r-- 1 kuzman kuzman 37970307 Aug 15 22:32 strict-core-unicopy_concat_cds_28-genomes_vitis3.aln
-rw-r--r-- 1 kuzman kuzman        0 Aug 15 22:32 strict-core-unicopy_concat_cds_28-genomes_vitis3.aln.identical_sequences
lrwxrwxrwx 1 kuzman kuzman       76 Aug 15 22:39 core-genome-based_reference_tree_vitis3.topology -> raxml_tree/RAxML_resultTopo.strict-core-unicopy_concat_cds_28-genomes_vitis3
lrwxrwxrwx 1 kuzman kuzman       74 Aug 15 22:44 core-genome-based_reference_tree_vitis3.branlen -> raxml_tree/RAxML_bestTree.strict-core-unicopy_concat_cds_28-genomes_vitis3
lrwxrwxrwx 1 kuzman kuzman       78 Aug 16 00:40 core-genome-based_reference_tree_vitis3.supports -> raxml_tree/RAxML_bipartitions.strict-core-unicopy_concat_cds_28-genomes_vitis3
drwxr-xr-x 2 kuzman kuzman     4096 Aug 16 00:40 raxml_tree
lrwxrwxrwx 1 kuzman kuzman       76 Aug 16 00:40 core-genome-based_reference_tree_vitis3.rooted -> raxml_tree/RAxML_rootedTree.strict-core-unicopy_concat_cds_28-genomes_vitis3
-rw-r--r-- 1 kuzman kuzman     1302 Aug 16 00:40 core-genome-based_reference_tree_vitis3.full
-rw-r--r-- 1 kuzman kuzman     2123 Aug 16 00:40 core-genome-based_reference_tree_vitis3.full.names

It seems they are not all in place.

Do you maybe have a clue what could be the problem?

Cheers,

Nemanja

flass commented 5 years ago

Hi Nemanja,

sorry it took me a bit long to come back to you. I'm not sure what is the source of that bug given that your files all look OK to me (the absence of the file ${pseudocorealn}.reduced and empty ${pseudocorealn}.identical_sequences just reflect that you don't have genomes that are redundant in their core genome concatenated sequence)

However, I spotted an issue that made the pipeline deal incorrectly with the FASTA format of the alignment in this situation: LSD requires an estimation of the alignment length i.e. count the number of sites, but instead it was counting the number of sequences. This might have caused the failure of LSD that is reported... I changed that in commit 47b02d3; you can try again see if this fixes your issue.

Cheers, Florent

flass commented 5 years ago

I realize that the error:

The input trees are not rooted, use either option -g to specify the outgroups file or -r to estimate the root

comes from LSD itself; so it does not see the input tree as rooted, even though you seem to have a rooted tree that was generated by the last step of the RAxML estimation... can you verify that the tree ${speciestree} (in your case core-genome-based_reference_tree_vitis3.full) is indeed rooted? it should be as it is derived from the RAxML output, but you never know...

flass commented 5 years ago

Hi Nemanja,

I think I got it fixed with 6ba3a7c. You can try this last version. If you want to avoid re-running the role tree-building, you can use the option -R to resume from the output of RAxML.

Florent

kuzman1306 commented 5 years ago

Hi Florent,

Thank you for your prompt replies today. I will continue with the updated version of the pipeline.

Thank you again for your effort.

Have a nice holidays!

Cheers,

Nemanja

On Fri, Aug 23, 2019 at 7:24 PM Florent Lassalle notifications@github.com wrote:

Hi Nemanja,

I think I got it fixed with 6ba3a7c https://github.com/flass/pantagruel/commit/6ba3a7c8c4a60a1c4a19ac8b779f51760be09bc3. You can try this last version. If you want to avoid re-running the role tree-building, you can use the option -R to resume from the output of RAxML.

Florent

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flass/pantagruel/issues/19?email_source=notifications&email_token=AMT2ONGV3ZRRWLIV65XH4LLQGAMMHA5CNFSM4IMKVA5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5A2PEA#issuecomment-524396432, or mute the thread https://github.com/notifications/unsubscribe-auth/AMT2ONEN6NN7JZE5BRDKALDQGAMMHANCNFSM4IMKVA5A .

kuzman1306 commented 5 years ago

Hi Florent,

Unfortunately, I am getting an error in Task 5 again:

35 nodes without branch support documented
Traceback (most recent call last):
  File "/home/kuzman/pantagruel_pipeline/pantagruel/scripts/putidentseqbackintree.py", line 42, in <module>
    intree.resolveNode([outgroup])
  File "/home/kuzman/pantagruel_pipeline/pantagruel/python_libs/tree2/Node.py", line 3052, in resolveNode
    raise IndexError, "please provide sub-root node(s) as outgroup(s)"
IndexError: please provide sub-root node(s) as outgroup(s)
ERROR: failed re-introducing identical sequences into reference tree
ERROR: Pantagrel pipeline task 5: failed.

I look forward to hearing from you.

Cheers,

Nemanja

flass commented 5 years ago

Hi Nemanja, sorry for the delayed response I was away from my computer. I am confused as I tested this version specifically with your trees... maybe you were lacking the patch on the dependent python library tree2/ that is set as a submodule of this repo ? when updating the repository, you should use:

git pull
git submodule update

please let me know if that fixes it. Florent

flass commented 5 years ago

Hi Nemanja,

turns out my previous fixes brought in more bugs (as revealed by tests by @pveber), so I had to make a few more changes. As of commit 0787f0e this should be fixed and stable.

again this involved making modifications of the dependency module tree2, so make sure to update fully with:

git pull
git submodule update