bioentity / papers

Production paper repository
1 stars 0 forks source link

10.1093/genetics/iyad153 #1751

Open bioentity-bot opened 1 year ago

bioentity-bot commented 1 year ago

https://bioentity.link//#/publication/10.1093/genetics/iyad153

bioentity-bot commented 1 year ago

https://bioentity.link//#/publication/10.1093/genetics/iyad153 @kyook New publication uploaded.

kyook commented 1 year ago

HI @suzialeksander I ran the YeastGENEonly keyword set on this article and it didn't find anything. I'll be available today if you want to zoom about it and about fixing the lexica and keyword sets for you.

kyook commented 1 year ago

Hi @suzialeksander , I updated the lexica using the files you sent. To confirm, the strain list only had 55 objects; let me know if that should not be the case.

I created a new keyword set based on these updated lexica, and used that to link this paper. Please have a look. The greek 'delta' does not get identified, it seems necessary to fix this, let me know so we can have NIck work on it.

kyook commented 1 year ago

@suzialeksander there is an object rad5Δ, but it isn't in any of the lexica that I can find, should it be (sorry if I missed it).

suzialeksander commented 1 year ago

no worries, that's just a shorthand for a Rad5 deletion, so I'd link the rad5 but not the Δ if possible.

suzialeksander commented 1 year ago

@kyook Sorry I missed the first comment. Yes, we don't have a lot of strains, so that list should be short.

It looks like we forgot to add [A-Z][a-z][a-z][d+] in the list. Technically, Act1 should either be the gene act1, the gene ACT1, or the protein Act1p. The third column of the lexica is for synonyms correct? I've added a column so the "shortcut" name is linked, since they're going to the same place, but we would prefer to link the correct name as a priority if possible: if the tool sees Aac1p, we'd prefer the whole name is linked not just the Aac1.

Yeast_ProteinsLexicon083023.txt

New file has the columns like:

S000004660 Aac1p Aac1

kyook commented 1 year ago

Hi @suzialeksander, Ok, I think I am understanding better about yeast entities. The best way to deal with variations of approved names for entities is to include them as primary name, so the ID would have another entry in the lexicon for each approved name since there can only be one name in column two.

Names in the synonym column, are not used to link, rather they should be used to validate the entity by curators and to predict entities to link by the scripts (Nick will have to work on scripting the prediction part at some point.)

You might realize that doing it this way has the added function of enforcing correct nomenclature use, since if the authors do not use the naming rules correctly, the object just does not get linked (or correctly linked). Since we should be able to feed back to the editors of these papers, you can request authors to change how they refer to their entities if they want them linked.

So I wouldn't worry about the 'shortcut' name as SGD doesn't want them commonly used.

The protein lexicon should have all protein names with the 'p' appended in the public name column. Let me know if you want me to fix this.

bioentity-bot commented 1 year ago

https://bioentity.link//#/publication/10.1093/genetics/iyad153

Link validation report

Validated Links

Invalidated Links

bioentity-bot commented 1 year ago

https://bioentity.link//#/publication/10.1093/genetics/iyad153 @suzialeksander marked publication ready for publisher review. @Benet1983 The article is now available on your server. @kyook