cannin / enhance_nlp_interaction_network_gsoc2020

3 stars 4 forks source link

Make Indra HTML Assembler Work With Chrome Plugin #5

Open cannin opened 4 years ago

cannin commented 4 years ago

The goal is to submit a pull request for the HTML Assembler that add a link that is compatible with the Chrome plugin.

https://indra.readthedocs.io/en/latest/modules/assemblers/html_assembler.html

The Indra statements that you get will likely have PMIDs that need to be converted to PMC IDs so that you can get to the correct link that has the full text of the manuscript.

https://indra.readthedocs.io/en/latest/_modules/indra/literature/pmc_client.html

This additional feature should not disturb the original behavior. This means the addition of a parameter "add_full_text_search_link" that is set to False by default. You can make this new link text to be "Full-Text Search: PMC12345", it should only show up if there is a PMC ID available; you'll have to make this change in the template.

https://github.com/sorgerlab/indra/blob/ba8a467e0795cd0b3e5ba128cf720a74c4c18c56/indra/assemblers/html/templates/indra/statements_view.html

You should let me test it before submitting it as a pull request.

PritiShaw commented 4 years ago

Hi Mentor

https://github.com/sorgerlab/indra/blob/ba8a467e0795cd0b3e5ba128cf720a74c4c18c56/indra/assemblers/html/templates/indra/statements_view.html

While trying to see how it works, using your script, I found that basic implementation for PMCID is already present https://github.com/sorgerlab/indra/blob/e631705c7e5412faf0398a614ed810ad4188a8fd/indra/assemblers/html/templates/indra/statements_view.html#L377

But when I used PMC7064752 as source for statements I was unable to get the pmcid in text_ref,

Should I check the root cause of PMCID missing from "text_refs" first as a separate PR, before proceeding with this one? If this is intentionally done, then I will change my approach and add pmcid in make_json_model

Also, I searched the docs but could not find anything like reach.api.process_pmc(pmcid) for PubMedID, can you please give me a pointer to the function.

Thanks

cannin commented 4 years ago

@PritiShaw I would post the question to Ben, in parallel with any checking you do.

PritiShaw commented 4 years ago

Since PMCID is missing from the text_refs, I fetched PMCID from PMID using function id_lookup id_lookup. You can see the output and the commit below: Output HTML Commit

cannin commented 4 years ago

@PritiShaw can you try the new Chrome plugin for links to text fragments: https://9to5google.com/2020/06/17/link-to-text-fragment-chrome-extension/ this might remove the need to for a custom plugin and make things more standardized.

cannin commented 4 years ago

This example link works for me without a plugin in Chrome 83.0.4103.106: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3118169/#__p10:~:text=Entity%20glyphs%20are%20differentiated%20by%20their,different%20arrowheads%20or%20other%20line%20end%2Dmarks. that works without a plugin

PritiShaw commented 4 years ago

@PritiShaw can you try the new Chrome plugin for links to text fragments: https://9to5google.com/2020/06/17/link-to-text-fragment-chrome-extension/ this might remove the need to for a custom plugin and make things more standardized.

I implemented the text fragment search, you can see the result in output_text_fragment.html Also, I want to mention some points

  1. It is not performing well for sentences with special characters
  2. It is highlighting but not scrolling to the highlighted sentence
  3. Documentation followed: https://web.dev/text-fragments/
cannin commented 4 years ago

@PritiShaw

  1. Yes, we have to be clever here. I would suggest some code that A) only grabs the first 6 words, and B) if you notice weird things like "XREF_FIG" or special characters then maybe use logic to switch to the text start and text end parameters? I don't think it will work perfectly always but we can get more often.
  2. Does the scrolling not work just for PMC or anywhere for you? This link scrolls for me:

https://www.npr.org/2020/06/18/880281963/sigh-of-relief-or-slippery-slope-advocates-and-opponents-react-to-daca-ruling/#callout__input:~:text=DACA%20recipients%20will%20continue%20to%20renew

cannin commented 4 years ago

What version of Chrome are you on?

cannin commented 4 years ago

This scrolls for me:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064752/#idm140627199571520:~:text=were%20consistently%20protected%20against%20infection but getting idm140627... will be problematic I was able to get the link using Google extension: https://chrome.google.com/webstore/detail/link-to-text-fragment/pbcodcjpfjdpcineamnnmbkkmkdpajjg/related

cannin commented 4 years ago

Can you ignore the div? This also works/scrolls from me: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064752/#:~:text=were%20consistently%20protected%20against%20infection

PritiShaw commented 4 years ago

My Chrome Version is 83.0.4103.106 I tried the links in incognito and guest mode, scroll is working fine there (with both "maincontent" specified and without it), I am not sure why this behavior It works in normal Chrome as well, but sometimes(very few) it does not Can we proceed? Because the failure rate is low

Thanks for the idea regarding improvising on cases where the sentence has special terms/characters I will work on that

PritiShaw commented 4 years ago

INDRA Result image As on PubmedCentral link image

While checking on why our text fragment was not working, I saw that the text is a bit different in INDRA despite source being same (as per my understanding) INDRA output has extra space, which is not present in PubmedCentral site. I am unable to understand the root cause of this extra space

Regarding improvising, I am trying to use a prefix and a suffix of our target instead of complete sentence

PritiShaw commented 4 years ago

INDRA Result image As on PubmedCentral link image

While checking on why our text fragment was not working, I saw that the text is a bit different in INDRA despite source being same (as per my understanding) INDRA output has extra space, which is not present in PubmedCentral site. I am unable to understand the root cause of this extra space

Regarding improvising, I am trying to use a prefix and a suffix of our target instead of complete sentence

Similar problem: missing - in INDRA result, found (whitespace) image image

cannin commented 4 years ago

I think we can move forward. I want this in the hands of more people to test. Can you re-make the example HTML you did before? What percentage of links will fail? If you try 20 how many don't work?

PritiShaw commented 4 years ago

As per your suggestion, I have modified the logic, you can find the output HTML here, I ran on statements from PMCID 7064752, 5791571 and 7064751 you can find the script here. I have made changes in the function openPMCIDJournal

Present logic for text fragment

Following is the test run I performed

Statement Works? Reason
SAG inhibits DNAJC5
SAG activates DNAJC5
RECQL4 activates DNAJC5 Missing - in CSP specific
DNAJC5 activates immune response
Aluminium(3+) activates immune response
SMQ activates TNF-α Partial match
SAG activates IFN-γ
S/AS03 activates DNAJC5 Missing - in CSP specific
IFN-γ activates APC
GS26575 activates GAST
Combo5 activates immune response
CD4 activates IL2
OleB gene activates β-lactone natural products
OleA genes activates OleA proteins
Sialyllacto-N-tetraose a binds Sialyllacto-N-tetraose b
OleA enzymes activates biosynthetic process biosynthesis not present in PMC
OleA activates trans-aconitic acid
AICDA activates OleA proteins
OleA activates OleA proteins

Total = 20 Success = 17 Failure % = 15%

cannin commented 4 years ago

@PritiShaw can you make the finished HTML for one of the papers you mentioned? Can you try it on a separate 20 and if you get a similar 15% failure rate, can you move ahead to making a pull request on this issue (let me know before you submit).

PritiShaw commented 4 years ago

Hi Mentor, Please find the output for the PMCID 7064751 here. Is this what you want to see, let me know if you need something else. I will analyze other 20 sentences and let you know.

cannin commented 4 years ago

Thanks. Are you under the impression that the #maincontent works better than without? If so why? For me, I tried 5 of the links and none worked with maincontent. If the links work without maincontent for you maybe we should not put it in?

@RohitChattopadhyay could you check @PritiShaw output and say if with or without maincontent works for you?

RohitChattopadhyay commented 4 years ago

@RohitChattopadhyay could you check @PritiShaw output and say if with or without maincontent works for you?

Hi Mentor, I tested the links and found following:

  1. Links directly opened from the Indra output are not highlighting/working
  2. If we copy the same link from omnibox and paste in new tab, then it works perfectly

The behavior is same for both links with or without "maincontent" in Chrome and Chromium based Edge in windows10.

I don't think its due to the URL, but the way the user (using window.open()) is being redirected I checked the documentation/blog link present in the conversation and found that the text will be highlighted if it is from a user activation (ref) and we need to send rel=noopener in the hyperlink. @PritiShaw can you please check if we can use a tag for the hyperlink, I see that the logic is in JavaScript, maybe implementing the same in Jinja will solve the problem.

Thanks

PritiShaw commented 4 years ago

Thanks. Are you under the impression that the #maincontent works better than without? If so why? For me, I tried 5 of the links and none worked with maincontent. If the links work without maincontent for you maybe we should not put it in?

The motivation behind providing "maincontent" was that, incase the text fragment is not found by the browser, it will scroll to "maincontent" div. Since maincontent covers most of the parts of the page, I think this fallback feature will not help much.

Following @RohitChattopadhyay 's comment, I searched issue tracker of Chromium browser, and found similar issue "Text-fragment doesn't activate when navigated via a redirect"

Adding Chromium Feature page for reference: Scroll To Text Fragment

The reason why I made a separate JavaScript function, was that Indra does not return the sentence as plain text, but it sends them as HTML with terms highlighted. This extraction of plain sentence text from HTML , I was not able to do in Jinja. @cannin, as a workaround should I make a JS function that will populate all the hyperlinks after the content is loaded?

PritiShaw commented 4 years ago

@RohitChattopadhyay Can you please test following output once https://gist.github.com/PritiShaw/aee595a5959d4f8be11831b9c5892230#file-output-html

I have modified the JS code to directly modify the href link

RohitChattopadhyay commented 4 years ago

@RohitChattopadhyay Can you please test following output once https://gist.github.com/PritiShaw/aee595a5959d4f8be11831b9c5892230#file-output-html

I have modified the JS code to directly modify the href link

All links are working in Chrome and Edge(Chromium) on Windows10

For ease of testing, I have made a link with appropriate content type for the gist

RohitChattopadhyay commented 4 years ago

All links are working in Chrome and Edge(Chromium) on Windows10

For ease of testing, I have made a link with appropriate content type for the gist

Working in Android Chrome as well

cannin commented 4 years ago

@PritiShaw Thanks, this works.

Can you put this character: https://www.compart.com/en/unicode/U+24D8 with a "title" attribute that says "This link will search for and highlight the sentence in the full-text article. Works with Chrome 80+" Put the character after and outside the link.

This link might be helpful: https://stackoverflow.com/questions/2731214/html-copy-doesnt-show

If you cannot make it work within 2 hours, ignore it and move on to making a pull request in INDRA.

@RohitChattopadhyay thanks for the githack link. @PritiShaw please use this from now on for any HTML example you create.

PritiShaw commented 4 years ago

Can you put this character: https://www.compart.com/en/unicode/U+24D8 with a "title" attribute that says "This link will search for and highlight the sentence in the full-text article. Works with Chrome 80+" Put the character after and outside the link.

Thanks for the links, I have implemented it, please find the links below Output link, Source code image

Let me know if any changes are required I will send PR to INDRA tomorrow(in 12hrs)

PritiShaw commented 4 years ago

Related PR https://github.com/sorgerlab/indra/pull/1120

cannin commented 4 years ago

PR looks good we wait.