SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
91 stars 29 forks source link

pangenome_alignment #92

Open ingridvanw opened 2 months ago

ingridvanw commented 2 months ago

Hi guys,

I have run pirate with the commandline: PIRATE -i /gff/ --steps 50,60,70,80,90,95,98 --features CDS --align --rplots --threads 4 --output /results/ There are around 500 gff files from Staph. aureus

GFF file looks like:

gff-version 3

sequence-region gnl|Bactopia|SAB003_1 1 854818

sequence-region gnl|Bactopia|SAB003_2 1 492009

sequence-region gnl|Bactopia|SAB003_3 1 236753

sequence-region gnl|Bactopia|SAB003_4 1 199243

sequence-region gnl|Bactopia|SAB003_5 1 179702

sequence-region gnl|Bactopia|SAB003_6 1 140822

sequence-region gnl|Bactopia|SAB003_7 1 134421

sequence-region gnl|Bactopia|SAB003_8 1 108311

sequence-region gnl|Bactopia|SAB003_9 1 91633

sequence-region gnl|Bactopia|SAB003_10 1 79645

sequence-region gnl|Bactopia|SAB003_11 1 67889

sequence-region gnl|Bactopia|SAB003_12 1 39179

sequence-region gnl|Bactopia|SAB003_13 1 33043

sequence-region gnl|Bactopia|SAB003_14 1 29835

sequence-region gnl|Bactopia|SAB003_15 1 25963

sequence-region gnl|Bactopia|SAB003_16 1 18742

sequence-region gnl|Bactopia|SAB003_17 1 11161

sequence-region gnl|Bactopia|SAB003_18 1 2094

sequence-region gnl|Bactopia|SAB003_19 1 1649

sequence-region gnl|Bactopia|SAB003_20 1 1458

sequence-region gnl|Bactopia|SAB003_21 1 1039

sequence-region gnl|Bactopia|SAB003_22 1 501

sequence-region gnl|Bactopia|SAB003_23 1 431

sequence-region gnl|Bactopia|SAB003_24 1 410

sequence-region gnl|Bactopia|SAB003_25 1 403

sequence-region gnl|Bactopia|SAB003_26 1 360

sequence-region gnl|Bactopia|SAB003_27 1 315

sequence-region gnl|Bactopia|SAB003_28 1 310

gnl|Bactopia|SAB003_1 prokka gene 944 1540 . - . ID=SAB003_00001_gene;Name=recR;gene=recR;locus_tag=SAB003_00001 gnl|Bactopia|SAB003_1 Prodigal:002006 CDS 944 1540 . - 0 ID=SAB003_00001;Parent=SAB003_00001_gene;Name=recR;gene=recR;inference=ab initio prediction:Prodigal:002006;locus_tag=SAB003_00001;product=recombination mediator RecR;protein_id=gnl|Bactopia|SAB003_00001 .....

I was wondering why there are so many N's in the pangenome_alignment.fasta? It looks like he has put every individual isolate to the pangenome?