Open zakandrewking opened 9 years ago
It happens whenever a gene has multiple CDS entries. Component loading is parsing CDS by CDS. B21_02159 has two CDS entries:
gene 2275802..2277159
/gene="ybl104"
/locus_tag="B21_02159"
CDS join(2275802..2276260,2276260..2277159)
/gene="ybl104"
/locus_tag="B21_02159"
/ribosomal_slippage
/codon_start=1
/transl_table=11
/product="ISEcB1 transposase"
/protein_id="CBY77859.1"
/db_xref="GI:313848701"
/db_xref="EnsemblGenomes:B21_02159"
/db_xref="EnsemblGenomes:CBY77859"
/db_xref="GOA:E5QQH5"
/db_xref="InterPro:IPR001584"
/db_xref="InterPro:IPR009057"
/db_xref="InterPro:IPR011991"
/db_xref="InterPro:IPR012337"
/db_xref="InterPro:IPR025948"
/db_xref="UniProtKB/TrEMBL:E5QQH5"
CDS 2275802..2276323
/gene="ybl104"
/locus_tag="B21_02159"
/codon_start=1
/transl_table=11
/product="ISEcB1 protein A"
/protein_id="CAQ32676.2"
/db_xref="GI:313848700"
/db_xref="EnsemblGenomes:B21_02159"
/db_xref="EnsemblGenomes:CAQ32676"
/db_xref="GOA:C5W711"
/db_xref="InterPro:IPR009057"
/db_xref="InterPro:IPR011991"
/db_xref="UniProtKB/TrEMBL:C5W711"
The only downside to loading CDS's this way is that the leftpos and rightpos correspond to a CDS, not the whole gene. So these need to be fixed eventually.
But things work pretty well right now.
Also, we only look at CDS's right now, so genes without a CDS do not have a match. E.g.:
WARNING:root:Gene not in genbank file: YPL276W from model iMM904
WARNING:root:Gene not in genbank file: YPL275W from model iMM904
WARNING:root:Gene not in genbank file: 57733_AT1 from model RECON1
This happens a lot during genome loading: