SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
88 stars 29 forks source link

is it possible to adjust the core gene threshold? #60

Open ramiroricardo opened 3 years ago

ramiroricardo commented 3 years ago

Hi Sion,

Is there a way to control the % of genomes that must have a gene for it to be considered core? From what I understood it is set at 95%, but thresholds like 99% are also common in the literature.

Thanks

SionBayliss commented 3 years ago

No problem! What part of the pipeline would you like to adjust the threshold? PIRATE only explicitly separates genes into core/accessory at certain steps, such as when it generates the core alignment and when it plots summary figures and tables. These steps could be tweaked to use a more meaningful threshold for your analysis and could most likely be run after your analysis so that it does not have to be repeated.

ramiroricardo commented 3 years ago

Hi Sion,

Thanks for your reply. I think it would be great to have such a threshold when the core alignment is generated. Though I think ideally, the same threshold would then be applied to the summary plots/tables to keep everything consistent.

SionBayliss commented 3 years ago

I will label this as a enhancement for the next release.

In the mean time changing the outputs to support this is relatively simple. The gene alignments can be generated using the scripts in the PIRATE/scripts directory inside your PIRATE output directory:

alignment:

create_pangenome_alignment.pl --dosage 1.25 -t 99 -i PIRATE.gene_families.ordered.tsv -f ./feature_sequences/ -o core_alignment.fasta -g core_alignment.gff

Plots are a little more complicated. You will need to search and replace 95 with 99 inside the following script (open it in a text editor) and then run it using:

Rscript plot_summary.R ./

Hope that helps, Sion

ramiroricardo commented 3 years ago

Thanks a lot, will test this soon!

haruosuz commented 3 years ago

I look forward to the next release.

I would set roary -cd 100 to generate core_gene_alignment.aln for core genome phylogeny.

https://sanger-pathogens.github.io/Roary/

-cd FLOAT percentage of isolates a gene must be in to be core [99]