bioentity / Bioentity.link

Other
0 stars 0 forks source link

get tair gene name lexicon for upub #61

Open kyook opened 4 years ago

kyook commented 4 years ago

The site to download GFF files is here. I don't know if there is an API to query for the terms we need. https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes

It would be good to start using the microPubs as articles to hone the bioentity linking for arabidopsis.

nickstiffler commented 4 years ago

Lenore had mentioned that Arabidopsis papers are supposed to have the locus ID next to the entities. She implied that our tool shouldn't be necessary, but that doesn't seem to be the case for the Arabidopsis micropubs. Although some have locus IDs in the reagents section, and we could link those automatically. I can't remember the format needed for the lexica, but we can grab the locus/gene names from gramene: http://ensembl.gramene.org/biomart/martview/8b7989a8575e6c8b2defb90ca5f78b3a/8b7989a8575e6c8b2defb90ca5f78b3a?VIRTUALSCHEMANAME=default&ATTRIBUTES=athaliana_eg_gene.default.feature_page.ensembl_gene_id|athaliana_eg_gene.default.feature_page.external_gene_name&FILTERS=&VISIBLEPANEL=resultspanel

kyook commented 4 years ago

the format for the lexica importer is MOD_ID Symbol Full Name Synonyms Notes

I think columns can be empty, but they have to be there. I went and got a list from the ensembl biomart. I'll see if I can load it into bioentity.link when I can access it again.

k

On Mon, Jan 13, 2020 at 8:23 PM nickstiffler notifications@github.com wrote:

Lenore had mentioned that Arabidopsis papers are supposed to have the locus ID next to the entities. She implied that our tool shouldn't be necessary, but that doesn't seem to be the case for the Arabidopsis micropubs. Some have locus IDs in the reagents section, and we could link those automatically. I can't remember the format needed for the lexica, but you can grab the locus/gene names from gramene: http://ensembl.gramene.org/biomart/martview/8b7989a8575e6c8b2defb90ca5f78b3a/8b7989a8575e6c8b2defb90ca5f78b3a?VIRTUALSCHEMANAME=default&ATTRIBUTES=athaliana_eg_gene.default.feature_page.ensembl_gene_id|athaliana_eg_gene.default.feature_page.external_gene_name&FILTERS=&VISIBLEPANEL=resultspanel

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bioentity/Bioentity.link/issues/61?email_source=notifications&email_token=AAEVKGSG4IZVVEJJU4V25LLQ5UH2VA5CNFSM4KD7DQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI25QAY#issuecomment-573954051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVKGUVCQTKOYLOFY2YMQ3Q5UH2VANCNFSM4KD7DQPQ .

nickstiffler commented 4 years ago

I did this a while back. https://bioentity.link/#/lexica/4942431 It might be worth linking an arabidopsis micropub to see if it works.

kyook commented 4 years ago

I had linked the plant papers, at least I thought I did this: https://bioentity.link/#/publication/10.17912%2Fmicropub.biology.000176 https://bioentity.link/#/publication/10.17912%2Fmicropub.biology.000196 I didn't use the correct URL constructor though

nickstiffler commented 4 years ago

I fixed the URL, but the problem is the identifiers.org doesn't have a config for AT. Maybe we should consider linking directly to MODs?

kyook commented 4 years ago

I found this at identifiers.org https://registry.identifiers.org/registry/tair.gene

Prefixtair.gene Registry URIhttps://registry.identifiers.org/registry/tair.gene Sample URLhttps://identifiers.org/tair.gene:Gene:2200934 Sample Compact identifiertair.gene:Gene:2200934 Sample ID (LUI)Gene:2200934


Karen Yook

Curator / Editor
WormBase Caltech / microPublication
email: kyook@caltech.edu
email: karen@wormbase.org
email: karen.yook@micropublication.org
skype name: wbkaren
zoom channel: https://caltech.zoom.us/j/5465231995
tel: +1(415)306-4150

On Sat, Jun 13, 2020 at 8:46 PM nickstiffler <notifications@github.com> wrote:
>
> I fixed the URL, but the problem is the identifiers.org doesn't have a config for AT. Maybe we should consider linking directly to MODs?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub, or unsubscribe.
kyook commented 4 years ago

Actually, I found this in the Identifiers.org registry https://registry.identifiers.org/registry/tair.gene

nickstiffler commented 4 years ago

That could work. We would need to create a lexicon that uses the TAIR gene IDs instead of the locus names (ATXXXXXX). This would also bypass bioentity.link and go directly to TAIR from identifiers.org.

kyook commented 4 years ago

I think the ATXXXXXX is the ID, the locus names are like GER1, YCF4

kyook commented 4 years ago

Ok I see that the gene page url looks has id=29919. I sent a slack message to Leonore to ask about it. I cannot see a list in their download site that has these IDs and gene names

nickstiffler commented 4 years ago

I think this file has it. We just need to combine the tables. ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_TAIRAccessionID_AGI_mapping.txt

kyook commented 4 years ago

Yes, we will need to combine them, Some authors do use the locus names in their articles, so we will need to make a lexicon with the locus names as gene names in addition to making a lexicon with the gene names as gene names with the IDs.

nickstiffler commented 4 years ago

I will have to see if that works. MOD ids are supposed to be unique, so in this case the locus name would be more like an alias.