bioentity / Bioentity.link

Other
0 stars 0 forks source link

"Link all" not propagating link (when linked entity not flanked by whitespace on both sides?) #21

Closed suzialeksander closed 4 years ago

suzialeksander commented 5 years ago

Is this the place for bug reports about the tool?

Was working on https://bioentity.link//#/publication/10.1534/genetics.119.302435

Trying to link rad9 in the phrase "rad9-deficient" using the link all feature. Other occurrences of "rad9-deficient" and related "rad9-deficiency" do not link and all must be linked individually. I have a screen recording of this behaviour but it seems GitHub doesn't like movies.

Had similar resistance in the old tool when what I wanted to link wasn't the whole "non whitespace" word or entity wasn't italicized properly in the old tool, not sure if that matters.

suzialeksander commented 5 years ago

In paper 302528, I had to manually find and link each occurrence of pif1 in the phrase " pif1-m2".

This issue is pretty major, as marking up a paper can take an hour or so more by having to manually link each occurrence, especially if there are multiple entities not linking. Manually linking up is also very error-prone and a lot of links were missed, leaving a lot of work for the second step (checking the proof). Tagging @nathandunn as it seems to be a coding issue.

kyook commented 5 years ago

Please apply this fix to worm papers as well. For worm however, the linking should only be suppressed if the entity is followed or preceded by a colon.

https://bioentity.link/#/publication/10.1534/genetics.119.302625

Authors have a lot of 'entity;entity' expressions where the genes did not get linked. I started linking them one by one, but there are a lot making it worth while to just put in a fix for papers, especially going forward. thanks.

nickstiffler commented 5 years ago

Words with semicolons should get linked now.

kyook commented 5 years ago

Great, I’ll check it out

On Thu, Aug 29, 2019 at 17:17 nickstiffler notifications@github.com wrote:

Words with semicolons should get linked now.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/bioentity/Bioentity.link/issues/21?email_source=notifications&email_token=AAEVKGULYRYDYHP2LKTTQJTQHBRIBA5CNFSM4IIVL4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5QFIPY#issuecomment-526406719, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEVKGQUMURBDA33MQ2ESMDQHBRIBANCNFSM4IIVL4UQ .

--


Karen Yook

Curator / Editor WormBase Caltech / microPublication email: kyook@caltech.edu email: karen@wormbase.org email: karen.yook@micropublication.org skype name: wbkaren tel: +1(415)306-4150

suzialeksander commented 5 years ago

This is still not working and probably adds at least 30-60 minutes to curation time on papers that have a lot of links affected, at 15-20 seconds per link at my fastest working speed. Propose to move to Showstopper level as it's increasingly frustrating to use the linkup tool.

kyook commented 5 years ago

Hi Suzie,

I'm sorry about the linking. You shouldn't have to do the linking manually at all, but if you do, it should be only in rare cases. When you see that there is a pattern to why an entity isn't getting linked, we should address it through fiddling with the script or in cases where it is an author-specific formatting, it should be fixed by Sheridan during the proof stage.

For this paper can you tell us which entities you had to link? I see things like hyphens, double semicolons and delta symbols, it could be that we need to allow entities to be followed or preceded by these characters for them to be recognized and linked.

Again, sorry about this.


Karen Yook

Curator / Editor
WormBase Caltech / microPublication
email: kyook@caltech.edu
email: karen@wormbase.org
email: karen.yook@micropublication.org
skype name: wbkaren
tel: +1(415)306-4150

On Fri, Oct 25, 2019 at 9:40 AM suzialeksander <notifications@github.com>
wrote:

> This is still not working and probably adds at least 30-60 minutes to
> curation time on papers that have a lot of links affected, at 15-20 seconds
> per link at my fastest working speed. Propose to move to Showstopper level
> as it's increasingly frustrating to use the linkup tool.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/bioentity/Bioentity.link/issues/21?email_source=notifications&email_token=AAEVKGSWL4JAYMFLUTFT4Y3QQMOOHA5CNFSM4IIVL4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECI4WVQ#issuecomment-546425686>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAEVKGUUNHJUSRGIEVXSGYDQQMOOHANCNFSM4IIVL4UQ>
> .
>
suzialeksander commented 5 years ago

https://bioentity.link//#/publication/10.1534/genetics.119.302700 I had to manually link

nickstiffler commented 5 years ago

It looks like I need to add the following exceptions:

I am not sure what to do about the prIME1. It can't just look for substrings because there would be too many false links. Should it look for a change in case and try to link or are there set characters that it should match (I see pr and oe)?

kyook commented 5 years ago

I did not want subscripted entities linked in worm papers, however it is easier to remove links than to not have them. So if you can't do species-specific linking, opt for applying the subscript linking to all.

I am assuming that there will only be a set characters that should match wrt pr and oe, but let's ask @suzialeksander first.

nickstiffler commented 5 years ago

Note to self: delta is being encoded as &#x0394;

nickstiffler commented 5 years ago

Is it important for the delta to be part of the link? links

kyook commented 5 years ago

Hi Suzie,

Nick is adjusting the linking now. Do you want the delta symbols to be included as part of the linked entity?

Karen

On Sun, Oct 27, 2019 at 12:14 AM nickstiffler notifications@github.com wrote:

Is it important for the delta to be part of the link? [image: links] https://user-images.githubusercontent.com/2396480/67631074-bd628a00-f84e-11e9-9c98-be23204b8053.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bioentity/Bioentity.link/issues/21?email_source=notifications&email_token=AAEVKGSWWNU6VEMKKZJIMJ3QQU5VVA5CNFSM4IIVL4U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKX4NY#issuecomment-546668087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVKGSBU3UIEW657E3UYE3QQU5VVANCNFSM4IIVL4UQ .

nickstiffler commented 5 years ago

The allele here is rlm1, so it doesn't match therlm1Δ. We could add the delta to the allele lexica or create an exception that links a trailing delta when present.

kyook commented 5 years ago

I would set it to just link the trailing delta when present. Worm also has these types of variation suffixes specifically, they are gof lof gf lf dm sd

There are probably others, so if possible it would be good if there was a way to be able to modify these modifiers, rather than relying on hard coding these things.


Karen Yook

Curator / Editor
WormBase Caltech / microPublication
email: kyook@caltech.edu
email: karen@wormbase.org
email: karen.yook@micropublication.org
skype name: wbkaren
tel: +1(415)306-4150
On Mon, Oct 28, 2019 at 12:15 PM nickstiffler <notifications@github.com> wrote:
>
> The allele here is rlm1, so it doesn't match therlm1Δ. We could add the delta to the allele lexica or create an exception that links a trailing delta when present.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
nickstiffler commented 5 years ago

I made the necessary fixes to allow those things to be automatically linked now. This includes genes prefixed with pr or oe, subscripts, preceded with hyphen, and a trailing delta symbol.

When I developed the tool to manually add links, it was intended to only be used for lexica that are missing from our database. In theory, everything in the database should be linked automatically. When things aren't linked, it is because the entity appears in way the tool wasn't designed to handle. Clicking "Link all" will not work in these cases because the linking tool hadn't been updated to recognize these situations. It is important to keep track of all instances where a known entity is not being linked automatically so we can update the tool to handle as many of these cases as possible and save time moving forward.

suzialeksander commented 5 years ago

The pr and oe are not typical prefixes, although this paper clearly uses them a lot, but it's not unheard of for them to be used.

Yeast also has notation in the format of XxxN::XxxN or XxxNdelta::XxxN, etc., so if you could make sure the linking works when an entity is immediately proceeded by a colon that would be great.

I chatted with @robnash, and SGD has historically not linked the delta after an entity. However, we would much prefer there to be a consensus among the groups doing markup, to produce a more consistent view for the readers. So, if there is anyone doing markup that would argue for the delta to be linked, we would like to hear it. This may involve doing a Quick Fix for now, to get the Link All functioning, and then we could discuss this and other quirks in person at Alliance Face to Face in Dec.

suzialeksander commented 5 years ago

Another case: TEL1/TEL1-hy909

TEL1 should be linked in both occurrences, so a prefix of / should be added too. I think some of the above changes have gone though, because one of the papers I'm currently doing is linking up pretty well. Thanks.

nickstiffler commented 5 years ago

Great to hear. I will see what I can do about the /.

suzialeksander commented 4 years ago

Link all failed again on https://bioentity.link//#/publication/10.1534/genetics.119.302971. Particularity affected 16 occurrences of "yRAD27" I think this time because the entity was (usually) directly preceded by the letter y, and sometimes followed by a slash. Not uncommon for us to have a gene/protein prefixed by a y or Sc even.

yRAD27

appears as:

(hFEN1/yRAD27) yRAD27-deficient yRAD27