Closed maguileraf closed 4 years ago
The pangenome alignment removes certain potentially problematic sequences which have a high copy number or large number of truncations. Any gene family with an average copy number/gene dosage of >1.25 will not be included in the alignment. This can be modified by changing the appropriate settings in the alignment scripts:
align_feature_sequences.pl --dosage 1.25 -i PIRATE.*.tsv -g ./modified_gffs/ -o /feature_sequences/ -p threads;
create_pangenome_alignment.pl --dosage 1.25 -i PIRATE.*.tsv -f ./feature_sequences/ -o pangenome_alignment.fasta -g pangenome_alignment.gff;
In the case of multi-copy genes PIRATE will pick the longest representative sequence to include per genome.
All the best, Sion
Thank you for the explanation. It makes more sense now.
Glad I could help!
Hi,
I am looking at my gene_families.tsv file and I have ~11,000 entries in there but when I look at the annotation file (pangenome_alignment.gff) there are only ~7500 and I can't seem to understand why there are a lot of gene families missing?
Thanks in advance, Marcela