WormBase / wormbase-pipeline

Wormbase Build Pipeline
http://www.wormbase.org
22 stars 13 forks source link

C. briggsae operons #255

Open MagdalenaZZ opened 1 year ago

MagdalenaZZ commented 1 year ago

Multiple issues with the C. briggsae operons. In the DB there are >1000 operons. Only 49 are dumped to the GFF. None are displayed in the JBrowse track.

MagdalenaZZ commented 1 year ago

(base) /Users/mz3/Downloads % gzcat c_briggsae.PRJNA10731.WS286.annotations.gff3.gz | grep -i operon | head I dicistronic_mRNA operon 1545621 1551885 . - . Name=CBOP0014 I dicistronic_mRNA operon 4813634 4819290 . + . Name=CBOP0012 I dicistronic_mRNA operon 6131032 6140144 . - . Name=CBOP0020 I dicistronic_mRNA operon 8229353 8238605 . + . Name=CBOP0008 I dicistronic_mRNA operon 8229353 8238605 . + . Name=CBOP0039 I dicistronic_mRNA operon 8240325 8240873 . - . Name=CBOP0046 I dicistronic_mRNA operon 8291903 8292553 . + . Name=CBOP0040 I dicistronic_mRNA operon 8292744 8296537 . + . Name=CBOP0047 I dicistronic_mRNA operon 15371458 15376933 . + . Name=CBOP0023 II dicistronic_mRNA operon 1782353 8352843 . - . Name=CBOP0050 (base) /Users/mz3/Downloads % gzcat c_briggsae.PRJNA10731.WS286.annotations.gff3.gz | grep -i operon | wc -l 49

scottcain commented 1 year ago

With regard to JBrowse: all of the operons that are in the GFF are displayed in JBrowse 1 but that track hasn't been ported to JBrowse 2 yet (some "species specific" tracks haven't been done. I'm working on it this week.

scottcain commented 1 year ago

Actually what I wrote above is not correct: there are no operons in the C. briggsae JBrowse of any version (I was looking at the wrong thing). Anyway, here's why: in the elegans GFF, there are three types of operons:

operon:operon operon:deprecated_operon operon:dicistronic_mRNA

but in the briggsae GFF, there is only operon:dicistronic_mRNA. I believe at some point, the operons in the briggsae GFF might have been something else (perhaps operon:operon) and I didn't catch the change. Anyway, I can update the processing and config to catch the operon:dicistronic_mRNA features. Of course, I can't speak to where the other operons in briggsae are.

scottcain commented 1 year ago

Since Magdalena said there are >1000 operons in the database for briggsae, and this paper says they identified 1100 operons (https://wormbase.org/resources/paper/WBPaper00041271) I'm guessing those the operons that are missing in the GFF. I'm going to assign to Mark as a best guess of who could likely fix this. (actually, I'm not going to assign to @markquintontulloch because I don't appear to have that authority)

markquintontulloch commented 1 year ago

Will take a look @scottcain

scottcain commented 1 year ago

I don't know what I was on yesterday, but dicistronic mRNAs are in both JBrowse 1 and 2 on both staging (WS288) and production (WS287), eg: https://wormbase.org/tools/genome/jbrowse2/?session=share-pXhyUFLl8h&password=IuYtq Perhaps it was because "Operon" wasn't in the name of the track :-)

Anyway, that at least is good news, since I don't have to track down a bug that was causing them to be overlooked.