WormBase / ACKnowledge

Author Curation to Knowledgebases
MIT License
1 stars 1 forks source link

Curation request for WBPaper00060476 #204

Closed vanaukenk closed 3 years ago

vanaukenk commented 3 years ago

Hi,

Just wanted to make a ticket for this in the AFP tracker so we don't lose sight of it.

We had an author request to first pass their paper, WBPaper00060476, which hadn't yet been run through the pipeline probably because it was published in October, 2020 and is still a 'temp' pdf.

https://github.com/WormBase/website/issues/8243

@valearna when you can, could you try to process this paper and I can take a look to make sure it's okay before we send it to the author?

Thank you!

valearna commented 3 years ago

Thanks for creating the ticket @vanaukenk. I'll process the paper as soon as possible.

valearna commented 3 years ago

I processed the paper on the dev AFP. @vanaukenk can you take a look and let me know if everything is ok before I add it to the production site?

http://textpressocentral.org:3000/overview?paper=00060476&passwd=1623072790.4876623&title=ENPL-1%2C%20the%20%3Ci%3ECaenorhabditis%20elegans%3C/i%3E%20homolog%20of%20GRP94%2C%20promotes%20insulin%20secretion%20via%20regulation%20of%20proinsulin%20processing%20and%20maturation.&journal=Development&pmid=33037039&personid=304&hide_genes=false&hide_alleles=false&hide_strains=false&doi=10.1242/dev.190082

vanaukenk commented 3 years ago

Thank you @valearna I'll take a look right now.

vanaukenk commented 3 years ago

@valearna I've looked over the results in the link above for WBPaper00060476 and, overall, the processing seems to have worked fine.

There are two variations I want to check with you, though:

tm2457 (WBVar00251341) tm3738 (WBVar00252346)

Given their frequency of mention in the paper and the broader C. elegans literature, I might have expected them to pass our tf/idf criteria for variations, so just wanted to check their wasn't something about their extraction that might have caused them to be missed.

This is in contrast to another variation in the paper that wasn't extracted, mgDf50 (WBVar00088974), but it doesn't have many mentions in the paper and is otherwise widely cited in the literature.

I'll check the production pipeline results, too, if you'd like.

This will be an interesting paper to send to the author; there is definitely some new and missing information that they could supply.

Thanks!

valearna commented 3 years ago

Thanks @vanaukenk. tm2457 and tm3738 don't have an associated gene in the DB, at least on mangolassi in the obo_data_variation table, and this is why they were not extracted. I don't remember why I added the filter to exclude variations without a 'gene' entry in that table, but we can remove it if it doesn't make sense. This filter is part of the new Python library and it's not affecting the pipeline currently in production.

vanaukenk commented 3 years ago

Thanks @valearna I think it makes sense as a sanity check to not include variations that don't have a 'gene' entry, although I also don't remember exactly why we put that filter in place. These two variations are associated with genes in WB, so can we check what is in the obo_data_variation table on tazendra?

vanaukenk commented 3 years ago

Actually, never mind. I realized I can check this on the OA. The associations to a gene are not there, either, but I'm not sure why.

azurebrd commented 3 years ago

@vanaukenk We get the variations nightly from ftp://ftp.ebi.ac.uk/pub/databases/wormbase/STAFF/nightly_geneace/variations.ace.gz The first one only has

Variation : "WBVar00251341" Public_name "tm2457" Species "Caenorhabditis elegans" Live Reference "WBPaper00051473" Method "NBP_knockout_allele"

The second has Variation : "WBVar00252346" Public_name "tm3738" Species "Caenorhabditis elegans" Live Method "NBP_knockout_allele"

vanaukenk commented 3 years ago

Thanks @azurebrd I'll discard my email draft :-)

The information in the nightly dump from the EBI seems incomplete, then.

I also just checked some other tm variations in the OA, e.g. tm902 or tm300, and like tm2457 and tm3738, they have genes associated in WB but not in the OA.

Shall we create a ticket in the website tracker to ask Hinxton about this?

azurebrd commented 3 years ago

@vanaukenk sure :) So long as they put what you want in that ftp file, the script will pick it up when it runs at 8pm pacific time.

vanaukenk commented 3 years ago

Note: added this issue to the 2021-06-10 Caltech conference call

vanaukenk commented 3 years ago

@valearna @draciti

Shall we go ahead and run this paper through the production pipeline?

I think the issues with the tm variation-gene associations might take some time to sort out at WB and if the filter is not in place in production, we should, in theory, pick up the two missing tm variations.

valearna commented 3 years ago

The current production pipeline would completely discard the paper since it is a temp pdf. I will run the dev pipeline on tazendra and I'll manually add the two variations. The dev pipeline writes the data in the same format as the production pipeline, so the results will be compatible with the production form

vanaukenk commented 3 years ago

Oh, darn, that's right.

@valearna Thanks for running this through again and manually adding the two tm variations.

I didn't think we'd be wandering into these sorts of issues with this paper, but it was probably good to realize what's going on behind the scenes so we can make any necessary changes to the variation-gene pipeline at Caltech.

valearna commented 3 years ago

Thanks for checking the extracted entities and for catching the issue @vanaukenk

valearna commented 3 years ago

Here's the link to the production form for paper 60476: https://tinyurl.com/y2cyfkzo

vanaukenk commented 3 years ago

Looks good @valearna Should I send the link to the author who wrote to WB, or will an email be automatically generated? Note that there are two corresponding authors on this paper; the author who wrote us is one of the two (Kao)

valearna commented 3 years ago

Since I manually processed the paper in debug mode, the email was sent out to me. I'll forward it to you so that you can send it directly to the authors

vanaukenk commented 3 years ago

Got it, thanks!

valearna commented 3 years ago

This issue is now being managed as a WB issue. Marking as Duplicate of https://github.com/WormBase/website/issues/8262