broadinstitute / SynerClust

source code for SynerClust
Other
9 stars 4 forks source link

Error building repo_spec #12

Closed Xanthomonass closed 4 years ago

Xanthomonass commented 4 years ago

Hi,

I am having troubles with the first steps of SynerClust and cannot figure out what is going wrong. I am trying to apply it on a set of 80 genomes (mix of draft and complete), using a newick tree built with PhyloPhlan2 as input.

I always get the error message that one of my genomes is present in the tree but absent in repo_spec.

bin/synerclust.py -w wd/ -r wd/paths.txt -t wd/xanthomonadaceae.nwk --run single -n 3

Started Wrote locus tags to locus_tag_file.txt reading genome to locus reading tree [TREE.NWK] parsing tree Error: Genome Stenotrophomonas_maltophilia_CFBP3035 found in the tree but not in the repo_spec.

I checked the spelling of the names between all the input files multiple times and nothing's wrong.

Here are the log files and paths files : locus_tag_file.txt needed_extractions.cmd.txt paths.txt run_SynerClust.log

Could you give me a hand to understand what's wrong here?

Thanks.

GeorgescuC commented 4 years ago

Hi @Xanthomonass ,

This is weird, I used the locus_tag_file.txt you posted to run some steps manually, and the "Stenotrophomonas_maltophilia_CFBP3035" genome should be properly found where the error occurs. From the log, it also seems like other genomes appearing first in the tree are found properly. Are you using the latest version of the code? Are you also using Python 2.7? If not, it may help to give a try to the Docker image. If you are, would you be willing to share the nwk tree you are using privately so I can use the same inputs as you do for debugging? I do not need the fasta or gffs to debug this issue, only the tree, and would keep the tree private. You can send it to cgeorges broadinstitute org.

Regards, Christophe.

Xanthomonass commented 4 years ago

Hi Christophe,

I installed the program two days ago by downloading it from the git so I guess I must have the latest version. I am running it with python2.7 and have all the dependencies correctly installed. I will send you my tree, hope you can figure out what I am doing wrong here!

Thanks a lot for your help, Lucas

GeorgescuC commented 4 years ago

The issue should now be solved. The newick string had line returns after every 1 or 2 genome names, so the genome name found in the tree was actually "\r\nStenotrophomonas_maltophilia_CFBP3035" rather than the expected "Stenotrophomonas_maltophilia_CFBP3035".