bactopia / bactopia

A flexible pipeline for complete analysis of bacterial genomes
https://bactopia.github.io
MIT License
397 stars 67 forks source link

[feature] Bactopia Tools - Roary/PIRATE improvements #159

Open rpetit3 opened 3 years ago

rpetit3 commented 3 years ago

Suggestions provided by @haruosuz

References:

https://github.com/SionBayliss/PIRATE#input-format Input format PIRATE accepts GFF3 annotation files containing matching nucleotide sequence at the end of the file.

https://sanger-pathogens.github.io/Roary/ Input files Roary takes GFF3 files as input. They must contain the nucleotide sequence at the end of the file.

http://www.iqtree.org/doc/Frequently-Asked-Questions#how-does-iq-tree-treat-gapmissingambiguous-characters How does IQ-TREE treat gap/missing/ambiguous characters? Gaps (-) and missing characters (? or N for DNA alignments) are treated in the same way as unknown characters, which represent no information.

https://evolution.genetics.washington.edu/phylip/doc/consense.html Consense -- Consensus tree program

https://github.com/harry-thorpe/piggy Piggy is a tool for analysing the intergenic component of bacterial genomes. It is designed to be used in conjunction with Roary (https://github.com/sanger-pathogens/Roary).

The output folder produced by Roary is required as an input to Piggy (specified by --roary_dir).

https://github.com/AdmiralenOla/Scoary Scoary is designed to take the gene_presence_absence.csv file from Roary

LS-BSR input You can also use as input the pan-genome as called from Jason Sahl's program LS-BSR (Large-Scale Blast Score Ratio).

haruosuz commented 3 years ago

Dear @rpetit3. Here are additional suggestions:

kusandeep commented 3 years ago

Dear @rpetit3,

I am wondering abut use of Scoary (https://github.com/AdmiralenOla/Scoary) with bactopia and looking forward to see this as a bactopia tool.

Thanks, Sandeep

haruosuz commented 3 years ago

Dear @rpetit3:

https://github.com/SionBayliss/PIRATE#usage
 --pan-opt      additional arguments to pass to pangenome_contruction

 -z             retain intermediate files [0 = none, 1 = retain pangenome 
                files (default - re-run using --pan-off), 2 = all]

https://bactopia.github.io/bactopia-tools/pirate/
    --keep_all_files            Retain all intermediate files

Btw. This link (https://doi.org/10.1128/mSystems.00190-20) was redirected to this page (https://journals.asm.org/journal/msystems).

rpetit3 commented 3 years ago

Hi @haruosuz

Apologies for the delay in responding, I'm going to look into this. I'll also get the link fixed, thank you for pointing it out!

Robert

rpetit3 commented 3 years ago

Hi @haruosuz, I've corrected the syntax to match PIRATE's (thank you for pointing that out!). It will be in the next version of Bactopia (v1.7.1)

@kusandeep - I'm going to work on adding SCOARY, thank you for suggesting it!

haruosuz commented 3 years ago

Dear @rpetit3:

bactopia can generate a core genome phylogeny using FastTree as well as IQ-TREE? A tree (.nwk) can be generated by FastTree from core_alignment.fasta.gz as well as binary_presence_absence.fasta.gz?

Supplying the IQ-TREE core genome phylogeny to Scoary with --newicktree bactopia-tools/pirate/core-genome/iqtree/core-genome.treefile printed the following Error:

ete3.parser.newick.NewickError: Unexpected newick format '100/100:0.0616018898' 
rpetit3 commented 3 years ago

I can totally add support for FastTree in the next version.

haruosuz commented 3 years ago

Dear @rpetit3:

bactopia can use pan‐ and core-genome analysis tools such as GET_HOMOLOGUES/GET_PHYLOMARKERS (https://pubmed.ncbi.nlm.nih.gov/29765358/) as well as Roary/PIRATE?

rpetit3 commented 3 years ago

I vote yes! At the moment I'm working on https://github.com/bactopia/bactopia/tree/dsl2 which will be the basis of v2, and once that's available it'll make add these tools and suggestions much easier.

haruosuz commented 2 years ago

Dear @rpetit3:

I wonder if bactopia can modify Prokka annotations (gene and product names) and/or PIRATE annotations (in bactopia-tools/pirate/core-genome/pirate/PIRATE.gene_families.tsv)? For example, annotations with databases (e.g. MEGARes, VFDB, eggNOG) can be appended to (or be substituted for) the Prokka/PIRATE annotations in bactopia?

rpetit3 commented 2 years ago

I think that's a great idea, and something we can plan for v2 (maybe not initial v2 release, but something to add)

haruosuz commented 2 years ago

I noticed that the documentation "pirate - Bactopia" (https://bactopia.github.io/bactopia-tools/pirate/) has changed a lot. Is there an archive of the documentation for the Bactopia version 1.X.X?

rpetit3 commented 2 years ago

I think mkdocs material added versioned docs. I'll see what I can do about this