Open kyook opened 4 years ago
Lenore had mentioned that Arabidopsis papers are supposed to have the locus ID next to the entities. She implied that our tool shouldn't be necessary, but that doesn't seem to be the case for the Arabidopsis micropubs. Although some have locus IDs in the reagents section, and we could link those automatically. I can't remember the format needed for the lexica, but we can grab the locus/gene names from gramene: http://ensembl.gramene.org/biomart/martview/8b7989a8575e6c8b2defb90ca5f78b3a/8b7989a8575e6c8b2defb90ca5f78b3a?VIRTUALSCHEMANAME=default&ATTRIBUTES=athaliana_eg_gene.default.feature_page.ensembl_gene_id|athaliana_eg_gene.default.feature_page.external_gene_name&FILTERS=&VISIBLEPANEL=resultspanel
the format for the lexica importer is MOD_ID Symbol Full Name Synonyms Notes
I think columns can be empty, but they have to be there. I went and got a list from the ensembl biomart. I'll see if I can load it into bioentity.link when I can access it again.
k
On Mon, Jan 13, 2020 at 8:23 PM nickstiffler notifications@github.com wrote:
Lenore had mentioned that Arabidopsis papers are supposed to have the locus ID next to the entities. She implied that our tool shouldn't be necessary, but that doesn't seem to be the case for the Arabidopsis micropubs. Some have locus IDs in the reagents section, and we could link those automatically. I can't remember the format needed for the lexica, but you can grab the locus/gene names from gramene: http://ensembl.gramene.org/biomart/martview/8b7989a8575e6c8b2defb90ca5f78b3a/8b7989a8575e6c8b2defb90ca5f78b3a?VIRTUALSCHEMANAME=default&ATTRIBUTES=athaliana_eg_gene.default.feature_page.ensembl_gene_id|athaliana_eg_gene.default.feature_page.external_gene_name&FILTERS=&VISIBLEPANEL=resultspanel
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bioentity/Bioentity.link/issues/61?email_source=notifications&email_token=AAEVKGSG4IZVVEJJU4V25LLQ5UH2VA5CNFSM4KD7DQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI25QAY#issuecomment-573954051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVKGUVCQTKOYLOFY2YMQ3Q5UH2VANCNFSM4KD7DQPQ .
I did this a while back. https://bioentity.link/#/lexica/4942431 It might be worth linking an arabidopsis micropub to see if it works.
I had linked the plant papers, at least I thought I did this: https://bioentity.link/#/publication/10.17912%2Fmicropub.biology.000176 https://bioentity.link/#/publication/10.17912%2Fmicropub.biology.000196 I didn't use the correct URL constructor though
I fixed the URL, but the problem is the identifiers.org doesn't have a config for AT. Maybe we should consider linking directly to MODs?
I found this at identifiers.org https://registry.identifiers.org/registry/tair.gene
Prefixtair.gene Registry URIhttps://registry.identifiers.org/registry/tair.gene Sample URLhttps://identifiers.org/tair.gene:Gene:2200934 Sample Compact identifiertair.gene:Gene:2200934 Sample ID (LUI)Gene:2200934
Karen Yook
Curator / Editor
WormBase Caltech / microPublication
email: kyook@caltech.edu
email: karen@wormbase.org
email: karen.yook@micropublication.org
skype name: wbkaren
zoom channel: https://caltech.zoom.us/j/5465231995
tel: +1(415)306-4150
On Sat, Jun 13, 2020 at 8:46 PM nickstiffler <notifications@github.com> wrote:
>
> I fixed the URL, but the problem is the identifiers.org doesn't have a config for AT. Maybe we should consider linking directly to MODs?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub, or unsubscribe.
Actually, I found this in the Identifiers.org registry https://registry.identifiers.org/registry/tair.gene
That could work. We would need to create a lexicon that uses the TAIR gene IDs instead of the locus names (ATXXXXXX). This would also bypass bioentity.link and go directly to TAIR from identifiers.org.
I think the ATXXXXXX is the ID, the locus names are like GER1, YCF4
Ok I see that the gene page url looks has id=29919. I sent a slack message to Leonore to ask about it. I cannot see a list in their download site that has these IDs and gene names
I think this file has it. We just need to combine the tables. ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_TAIRAccessionID_AGI_mapping.txt
Yes, we will need to combine them, Some authors do use the locus names in their articles, so we will need to make a lexicon with the locus names as gene names in addition to making a lexicon with the gene names as gene names with the IDs.
I will have to see if that works. MOD ids are supposed to be unique, so in this case the locus name would be more like an alias.
The site to download GFF files is here. I don't know if there is an API to query for the terms we need. https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes
It would be good to start using the microPubs as articles to hone the bioentity linking for arabidopsis.