RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
30 stars 8 forks source link

Import FlyBase using JSON #157

Closed AntonPetrov closed 5 years ago

AntonPetrov commented 7 years ago

1605 FlyBase locus_tags are found in GenBank and not found in ENA. Need to synchronise the two so that all FlyBase ncRNA features appear in the ENA non-coding product.

blakesweeney commented 6 years ago

Another aspect to this issue is we need to get more data for each accession as well. For example, at the meeting we got a comment that we cannot search with flybase transcript ids. Checking the entry that was an example URS0000002CB3_7227 shows that we do not store the transcript id and thus it can't be searched. We have the gene id I think.

sjm41 commented 6 years ago

Hi Blake

I had assumed transcript IDs (FBtr#) would be included in the ENA accession and so would be trivial to pull out. However, it seems neither gene (FBgn) nor transcript (FBtr) IDs are included in the ENA files! Compare the tRNA example in the two attachments - GenBank has the IDs as db_xrefs, ENA does not. (I think you must be grabbing the FBgn IDs from the xref file we’re currently submitting to ENA with each FB release - this includes FBgn but not FBtr.)

Of course, FlyBase can certainly supply both IDs in direct submissions to you in the proposed json format, so this shouldn’t be a problem going forward.

steven.

On 9 Nov 2017, at 11:25, Blake Sweeney notifications@github.com wrote:

Another aspect to this issue is we need to get more data for each accession as well. For example, at the meeting we got a comment that we cannot search with flybase transcript ids. Checking the entry that was an example URS0000002CB3_7227 shows that we do not store the transcript id and thus it can't be searched. We have the gene id I think.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RNAcentral/rnacentral-webcode/issues/157#issuecomment-343126431, or mute the thread https://github.com/notifications/unsubscribe-auth/APjSrReq80T0zj0ZuPpFGvM3ccXAZpAeks5s0uFGgaJpZM4NUMED.

 gene            complement(17620144..17620216)
                 /gene="tRNA:Ala-AGC-2-3"
                 /locus_tag="Dmel_CR31574"
                 /gene_synonym="AE002708.trna50-AlaAGC"
                 /gene_synonym="Ala"
                 /gene_synonym="chr3R.trna56-AlaAGC"
                 /gene_synonym="CR31574"
                 /gene_synonym="Dmel\CR31574"
                 /gene_synonym="GCU"
                 /gene_synonym="tRNA-Ala-AGC-2-3"
                 /gene_synonym="tRNA:A:90C"
                 /gene_synonym="tRNA:A:AGC:AE002708-k"
                 /gene_synonym="tRNA:ala:90C"
                 /gene_synonym="tRNA[Ala]"
                 /gene_synonym="tRNA[[Ala]]"
                 /note="transfer RNA:Alanine-AGC 2-3; alternate names:
                 transfer RNA:ala:90C"
                 /map="90B6-90B6"
                 /db_xref="FLYBASE:FBgn0011840"
 tRNA            complement(17620144..17620216)
                 /gene="tRNA:Ala-AGC-2-3"
                 /locus_tag="Dmel_CR31574"
                 /gene_synonym="AE002708.trna50-AlaAGC"
                 /gene_synonym="Ala"
                 /gene_synonym="chr3R.trna56-AlaAGC"
                 /gene_synonym="CR31574"
                 /gene_synonym="Dmel\CR31574"
                 /gene_synonym="GCU"
                 /gene_synonym="tRNA-Ala-AGC-2-3"
                 /gene_synonym="tRNA:A:90C"
                 /gene_synonym="tRNA:A:AGC:AE002708-k"
                 /gene_synonym="tRNA:ala:90C"
                 /gene_synonym="tRNA[Ala]"
                 /gene_synonym="tRNA[[Ala]]"
                 /product="tRNA-Ala"
                 /note="tRNA:Ala-AGC-2-3-RA; Dmel\tRNA:Ala-AGC-2-3-RA;
                 CR31574-RA; Dmel\CR31574-RA"
                 /db_xref="FLYBASE:FBtr0083499"
                 /db_xref="FLYBASE:FBgn0011840"

FT gene complement(17620144..17620216) FT /map="90B6-90B6" FT /gene="tRNA:Ala-AGC-2-3" FT /gene_synonym="AE002708.trna50-AlaAGC" FT /gene_synonym="Ala" FT /gene_synonym="chr3R.trna56-AlaAGC" FT /gene_synonym="CR31574" FT /gene_synonym="Dmel\CR31574" FT /gene_synonym="tRNA:A:AGC:AE002708-k" FT /gene_synonym="tRNA:ala:90C" FT /gene_synonym="tRNA[Ala]" FT /gene_synonym="tRNA[[Ala]]" FT /gene_synonym="GCU" FT /gene_synonym="tRNA-Ala-AGC-2-3" FT /gene_synonym="tRNA:A:90C" FT /locus_tag="Dmel_CR31574" FT /note="transfer RNA:Alanine-AGC 2-3; alternate names: FT transfer RNA:ala:90C" FT tRNA complement(17620144..17620216) FT /gene="tRNA:Ala-AGC-2-3" FT /gene_synonym="AE002708.trna50-AlaAGC" FT /gene_synonym="Ala" FT /gene_synonym="chr3R.trna56-AlaAGC" FT /gene_synonym="CR31574" FT /gene_synonym="Dmel\CR31574" FT /gene_synonym="tRNA:A:AGC:AE002708-k" FT /gene_synonym="tRNA:ala:90C" FT /gene_synonym="tRNA[Ala]" FT /gene_synonym="tRNA[[Ala]]" FT /gene_synonym="GCU" FT /gene_synonym="tRNA-Ala-AGC-2-3" FT /gene_synonym="tRNA:A:90C" FT /locus_tag="Dmel_CR31574" FT /product="tRNA-Ala" FT /note="tRNA:Ala-AGC-2-3-RA; Dmel\tRNA:Ala-AGC-2-3-RA; FT CR31574-RA; Dmel\CR31574-RA"

blakesweeney commented 6 years ago

Thanks for pointing that out to us. We will move to the direct submission approach as quickly as we can.

AntonPetrov commented 5 years ago

The FlyBase data imported via JSON are now available on the test website: https://test.rnacentral.org/search?q=expert_db:%22FlyBase%22

The release should take place in the week of Aug 20th.