WormBase / wormbase-pipeline

Wormbase Build Pipeline
http://www.wormbase.org
22 stars 13 forks source link

GFF generation for C. nigoni #258

Open sdiamantakis opened 1 year ago

sdiamantakis commented 1 year ago

From Scott:

Who should I poke about the GFF generation for C. nigoni? In this line of GFF:

CM008514.1 WormBase_imported mRNA 14313335 14315164 . - . ID=transcript:Cnig_chr_X.g24897;Parent=gene:Cnig_chr_X.g24897;Name=Cnig_chr_X.g24897;info=method:InterPro accession:IPR013750 description:GHMP kinase%2C C-terminal domain %0Amethod:InterPro accession:IPR014721 description:Ribosomal protein S5 domain 2-type fold%2C subgroup %0Amethod:InterPro accession:IPR015192 description:Switch protein XOL-1%2C N-terminal %0Amethod:InterPro accession:IPR015193 description:Switch protein XOL-1%2C GHMP-like %0Amethod:InterPro accession:IPR020568 description:Ribosomal protein S5 domain 2-type fold

There are embedded newlines (%0A) which is making JBrowse 2 do stupid things when trying to dump data back out. Since this is supposed to be read as a single entry, maybe the new lines could be replaced an unencoded comma, indicating a list of distinct items (a “good” GFF parser then would split that into an array/list).

scottcain commented 1 year ago

This is in support of adding data dumping to JBrowse 2; a user requested the ability to dump genbank formated data (functionality present in GBrowse but not in JBrowse 1). While it mostly works, the embedded newlines are breaking it. The associated issue over at JBrowse is https://github.com/GMOD/jbrowse-components/issues/3094#issuecomment-1476611129