enasequence / webin-cli

Webin command line submission program.
Apache License 2.0
28 stars 5 forks source link

Unable to represent known frameshifts in CDS #100

Open ValWood opened 1 year ago

ValWood commented 1 year ago

We have represented known frameshifts in the genome sequence as 1-2 bp overlaps in CDS features (we were once advised by Ensembl that this was the correct way to annotate these). This is necessary to ensure that the correct CDS are propagated to UniProtKB.

In the submission we are getting an error ERROR: Intron usually expected to be at least 10 nt long. Please check the accuracy. [ line: 863 of chromosome1.embl.gz]

We can't change the underlying sequence as the fission yeast community have been assured that the sequence will be stable until a final end-to-end contiguation is available to avoid massive disruption to functional genomics laboratories.

Please advise (should this be a warning rather than an error?)

amanzanom commented 7 months ago

Same here, I have now been in contact for almost a year with someone from ENA and keep getting the answer of "we think it is fixed now, cna you try it". Every time, it is not fixed. I am also holding now close to 50+ genomes that need to be submitted. A strange rule this 10nt minimum for an "intron".

amanzanom commented 7 months ago

In ym last communication, they instructed me to use the "/exception="annotated by transcript or proteomic data". However, this has been of no use. Same errors. And yes, this should be a warning, not an error. Frameshifts in functional bacterial genes for a control of expression, etc. are extremely common, so I do not get it

ValWood commented 7 months ago

We have been trying to update a model organism genome since April. I now have at least 77 e-mail correspondence about this (even though we did it numerous times in the past, it has never been this difficult). Some requirements have changed since our last submission we are told this is INSdc requirement, but it makes the data non-FAIR compliant and does not seem to apply to NCBI submissions. The documentation is very disorganized, jargonny and unhelpful, and the response time is slow. I'm not even sure this tracker is monitored (I tried it as a last resort, but the help desk don't use it or refer to it)

For your issue you might be able to use /ribosomal_slippage for this? | a join operator,e.g.: [join(486..1784,1787..4810)] should be used in the CDS spans to indicate the location of ribosomal_slippage

Good luck!

amanzanom commented 6 months ago

Hey,

Thanks for the answer, good to know I am not the only one suffering from this. In my case, these are transcriptional slippages, so in theory, not ribosomal frameshifts. The latter works well, but do not want to deposit incorrect data here... Lets hope I can solve this before my 100th email. They are slow and they keep not really solving the issue