geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Removes JXON use for native DOMParser for Pubmed abstract parsing #685

Open AlexanderNull opened 1 year ago

AlexanderNull commented 1 year ago

fixes #655

First time touching this codebase so tried to keep changes as minimal as possible. Line indentation looks a bit off as it appears there was a previous mix of tabs and spaces on the older code. Didn't attempt to change that as it should be addressed in a larger formatting once over changeset if needed.

As for the changes I did make: JXON was having some difficulties with the returned titles and abstracts for certain articles as uncovered in the linked AmiGO issue. Replacing the out of date JXON library with the native DOMParser module provides more control over the return type formatting and does not break on instance of embedded html tags in the results as JXON was breaking.

Went the route here to use each node's textContent value instead of innerHTML values to provide default stripping of those html tags. If it is instead desired to maintain Pubmed's inconsistent use of tags then innerHTML can be called instead.

Didn't find tests related to this page and @kltm advised that getting this running locally is a bit daunting to leaving this in their capable hands for now.

kltm commented 1 year ago

Cheers! Queued to test (we're running a little behind).

kltm commented 6 months ago

(Okay, running a lot behind.)

kltm commented 6 months ago

Hm. As an experiment, we have (I believe) your code at

http://amigo-exp.geneontology.io/amigo/reference/PMID:30352852 and the current HEAD at: https://amigo.geneontology.org/amigo/reference/PMID:30352852

Unfortunately, a little issue on the fixed code with Uncaught ReferenceError: xmlParser is not defined.

kltm commented 6 months ago

Making a guess that you meant the default DOM parser, I added:

                        var xmlParser = new DOMParser();

Unfortunately, now it goes into a parse error immediately after. I've left this code in place for the moment, if you're still interested.