SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
88 stars 29 forks source link

For some single loci, a gene family but for others not. #81

Closed margotmdr closed 2 years ago

margotmdr commented 2 years ago

Hi,

When I was looking at all the different loci that are present in the *.tab files of the co-ords folders and compared those with the loci that are present in the table of PIRATE.gene_families.ordered.tsv, I noticed that some locus tags were missing. I noticed that some single locus tags are assigned to a separate gene family (a gene family containing only 1 locus) but other locus tags aren't assigned to a gene family at all. And those are thus not in this table. Why is that?

Kind regards, Margot

SionBayliss commented 2 years ago

Hi, sorry for the late reply, I have been moving institution. The loci might have been filtered out before being passed to the PIRATE pipeline proper. This could happen for numerous reasons, they are too small (<120 aa I think), have erroneous characters, no start/stop codons etc. Usually this accounts for any missing loci.