gaps in core_alignment.fasta

SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.

GNU General Public License v3.0

88 stars 29 forks source link

It looks like your plasmid sequences are quite divergent, of your core genes (~25% of total genes) almost all of them are ~30-50% sequence divergence. Additionally, many of those genes also show min/max length variation.

Ns/- are added to represent sequence/genes missing in individual isolates e.g. if genomeA does not have a copy of gene1 then the length of the alignment for gene1 will be represented by Ns. Similarly Ns will be added by MAFFT where there are alignment gaps between divergent or different length sequences.

I would suggest you curate the genes you think are useful to align, e.g. have a similar copy number and length, before aligning these genes individually. Your dataset is sufficiently small that this sort of manual inspection/analysis would be warranted and efficient.

I hope that helps.

All the best, Sion

SionBayliss / PIRATE

gaps in core_alignment.fasta #65