IUBLibTech / newton_chymistry

New version of 'The Chymistry of Isaac Newton', using XProc pipelines to generate a website based on TEI XML encodings of Newton's alchemical manuscripts, and Apache Solr as a search engine.
2 stars 0 forks source link

Update LSA Code when we launch for real #121

Open mdalmau opened 2 years ago

mdalmau commented 2 years ago

@randalldfloyd : has a static version of the LSA tool running on poplar. Once we launch, he will update that. Marking this as production task.

mdalmau commented 1 year ago

Adding @randalldfloyd's last email summary here so we don't forget that this needs to be done when we finally deploy the site:

So now that we have the old LSA working again, there are still problems getting it working for the new P5 site and data.

If I recall, the problem is that we can get the latest version of LSA code working standalone, but when we try to proxy it through the new XProc site, it doesn't work. Con's design is to make it look like the LSA component is part of the overall P5 site by doing a backend fetch, wherever it is deployed, and then proxying its response back through to the browser. This actually worked with the old PHP site as it was deployed on maple, but we couldn't get it working with the new LSA PHP no matter where it was deployed.

You can see that this is still a problem because the production P5 site at newton.dlib.indiana.edu can no longer render the LSA component like it could before, and that's because in order to get LSA working again under webapp1.dlib/newton/lsa, I had to run the new code against the legacy database. So now it exhibits the same problem:

https://newton.dlib.indiana.edu/newton/lsa/index.php

Last time I looked at it I wondered if it was something different about the web server itself, but it doesn't work if I deploy it to Apache, Nginx, or PHP-FPM. So now I would lean more towards some difference in the structure of the HTML in the new LSA code. I believe Con's XProc code targets something in the HTML body to pipe back through to the browser, and that would be somewhat brittle and prone to break due to minor differences.

If we can't ever get that to work we will have to resort to redirecting the browser out completely to the LSA tool versus trying to proxy it through the XProc site, which may require some work to the way the menu items and internal links are derived so that they point correctly back to the P5 site. From: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu) Sent: Friday, March 10, 2023 3:00 PM To: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Subject: RE: all LSA components fail to load -- dlib mysql?

Thanks for checking the load in so many places. It is reassuring enough that I will stop worrying.

I will check whether it’s still a problem here.

Wally

From: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Sent: Thursday, March 9, 2023 2:33 PM To: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Subject: Re: all LSA components fail to load -- dlib mysql?

Wally,

I tested this at the Library from a Windows machine today, and both Chrome and Firefox loaded consistently under 3 seconds with cache disabled. When you say the spinning GIF, do you mean the Ajax loader animation? My load times are so fast I never even see that GIF. From: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Sent: Thursday, March 9, 2023 10:05 AM To: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Subject: Re: all LSA components fail to load -- dlib mysql?

Wally,

I haven't experienced the delay in loading. Even here at my house, not on VPN, I get the page and all JavaScript fully loaded in anywhere between 3 to 5 seconds per the console statistics, but visibly it doesn't even look like it took that long, and all controls are available immediately after it appears to be done loading. That's with cache disabled - if I let it cache the JS, the load is nearly instantaneous at about 1.5 seconds. I'll be in the office today and I can try on a Windows machine from there.
From: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu) Sent: Wednesday, March 8, 2023 5:57 PM To: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Subject: RE: all LSA components fail to load -- dlib mysql?

Hi Randall,

Thanks for checking into this problem further.

Your strategy of using the github version with the legacy data would have been my suggestion, too.

I also find that the LSA component on webapp1 works in Chrome and Firefox, thank you, thank goodness, that is a relief.

Overall I’m happy with what’s on webapp1 and hope the same fix can be applied to the development and productions versions on possum as well.

From here, curiously, there was a lengthy delay in finishing the initial load of the LSA page in both browsers, with the spinning GIFs. The html text itself does appear but none of the visible controls are responsive until the spinning GIFs stop and the page becomes “quiet.”

If I try to do something in the component before the spinning stops, like choosing a job then clicking the CONTINUE button, I interrupt the current initialization and need to restart it (control-R) and wait.

Once the spinning does stop, I can work in the component without further apparent delays.

I don’t know whether other users, like interested colleagues at Cambridge, would also see that initialization delay, but if they do, it could discourage them.

Unless others run into the lengthy initialization on a regular basis. I’m hoping the long hello is local to me, but I will warn them that the first step may be a long one.

Thanks again!

Wally

From: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Sent: Wednesday, March 8, 2023 12:29 PM To: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Subject: Re: all LSA components fail to load -- dlib mysql?

Wally,

Hi, just now getting some time to loop back on this problem. I have some successes, but then again, I don't have a full understanding of all the permutations involved in running this tool. We have old LSA code that was running on maple; new LSA code that was never really running anywhere; legacy databases correlated with the old site; and new databases correlated with the new P5 site.

So, my question is, is it a valid configuration to use the latest LSA code at https://github.com/IUBLibTech/newton_lsa.git but pointed at the legacy database in order to restore what had been working as webapp1.dlib/newton/lsa? The reason I ask is because the code that was running on maple, available as webapp1.dlib/newton/lsa, must have been much older than what's in the repo. This is evident from the database error that I noted earlier, because that call has been deprecated since or before PHP 7, which is what the new server uses. On maple, it was more likely to have been using PHP 5.

If it's okay to run the latest code against the legacy data, I think I have that working at https://webapp1.dlib.indiana.edu/newton/lsa/ . It seems to work but I don't know the tool well enough to really test it. I was able to do a document-document correlation with results and highlighted terms.

If that isn't valid and we can only use the old code that was running on maple for the legacy database, then I will have to create some kind of container to host PHP 5 to run it out of because we aren't going to be supported using PHP 5 natively on the new server.

From: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu) Sent: Monday, February 27, 2023 5:52 PM To: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Cc: Newman, William Royall [wnewman@indiana.edu](mailto:wnewman@indiana.edu) Subject: RE: all LSA components fail to load -- dlib mysql?

Thanks for checking all that, Randall.

My Ubuntu localhost is using php -v 7.4.3 in case that helps.

Please, let me know if I need to do anything, thanks!

Wally

From: Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Sent: Monday, February 27, 2023 2:42 PM To: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu); Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Cc: Newman, William Royall [wnewman@indiana.edu](mailto:wnewman@indiana.edu) Subject: Re: all LSA components fail to load -- dlib mysql?

Hey all, just wanted to let you know what I see at a glance...

First of all, I know this is confusing, but all Newton instances, both old and new, point to the one legacy version of the LSA tool at https://webapp1.dlib.indiana.edu/newton/lsa/index.php. My recollection is that getting LSA working directly in the new XProc apps against new databases had problems that were never worked out. That's a different topic, but I can tell you for certain that all of the instances go to the URL above regardless of what the URL may show in the browser (the new XProc app proxies the LSA HTML from webapp1 back through itself, giving the impression that it is self-contained, but it isn't.)

That being said, the one instance of LSA noted above is currently broken, which explains why they are all broken. At the end of the year last year, the host webapp1.dlib was finally moved off of legacy servers to new infrastructure. The LSA PHP code is now throwing errors in the logs that seem to indicate that MySQL modules are missing or not enabled. Given that, there could be other missing PHP bits to work through. It could even be a PHP version issue; PHP apps are not a piece of our infrastructure I deal with much, but I have been keeping Brian Wheeler informed of my findings.

The good news is that I verified via past email conversations that we did indeed migrate the underlying databases off of legacy servers and into the new MySQL server. They were all working at that time (April 2022), so no worries as far as having lost access to databases or data. From: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu) Sent: Saturday, February 25, 2023 1:41 PM To: Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Cc: Newman, William Royall [wnewman@indiana.edu](mailto:wnewman@indiana.edu); Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Subject: RE: all LSA components fail to load -- dlib mysql?

All the LSA components on library servers both production and development should be using the DLIB mysql database, not RDC.

My sitehost-test and localhost LSA components do use the RDC mysql database, but we stored the same material in the DLIB database so it was all under DLIB control.

I wonder whether the DLIB mysql database been moved?

Thanks again, Wally

From: Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu) Sent: Friday, February 24, 2023 1:49 PM To: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu); Halliday, James Leonard [jhallida@indiana.edu](mailto:jhallida@indiana.edu) Cc: Newman, William Royall [wnewman@indiana.edu](mailto:wnewman@indiana.edu); Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Subject: Re: all LSA components fail to load

Hi, Wally,

I have no idea what’s happening and can’t tell from looking at the web site. I have added Jim here since he might need to assign another developer to investigate.

Jim, these are the sites that Wally is referencing, which aren’t actually public yet: https://newton.dlib.indiana.edu/ The production P5 site, proxies to possum:8197. It is configured to only index published documents out of Xubmit. https://newton-devel.dlib.indiana.edu/ The development P5 site, proxies to possum:8180. It is configured to index all documents out of Xubmit

And the webapp1 site is the current, public site that’s still using XTF: https://webapp1.dlib.indiana.edu/newton The legacy P4 site, proxies to possum:8344.

But the LSA tool, as far as I can tell, and as Wally reported, is not working for any of them, which hopefully means it is something they are commonly experiencing and hopefully it’s something small like updating the connection to RDC.

I am not at all familiar with the technical details for the LSA piece so I am not helpful at all.

Thanks, --Michelle

From: Hooper, Wallace Edd [whooper@iu.edu](mailto:whooper@iu.edu) Date: Friday, February 24, 2023 at 1:34 PM To: Dalmau, Michelle Denise [mdalmau@indiana.edu](mailto:mdalmau@indiana.edu), Floyd, Randall Dean [rdfloyd@indiana.edu](mailto:rdfloyd@indiana.edu) Cc: Newman, William Royall [wnewman@indiana.edu](mailto:wnewman@indiana.edu) Subject: all LSA components fail to load

Hi Michelle and Randall,

I’ve just found that the Newton LSA component is not loading on webapp1 or on the possum prod and dev versions.

Is the connection to the mysql database on the Research Data Complex (RDC) failing?

Since all three are failing and the LSA component starts by reading from RCD, that seems likely.

I hate to bother you with this, it is a nuisance, but there is a sudden burst of interest in LSA from different directions and it would help to show the original versions.

Can you advise?

Wally

randalldfloyd commented 1 year ago

This was sent via email, adding here to keep the issue up-to-date:

Judging by the look of the response in the browser when using the XProc site to proxy the LSA component, it just has the appearance of a fundamental parsing error. And, if you look at it standalone, there are some visual elements that look a little abnormal to me (alignment etc.) that might also point to parsing errors.

Anyway, it occurred to me to run the resulting html from webapp1 through a validator and there are a lot of errors. Some are just complaining of illegal values and improper usage of meta tags and such that comes with deprecation over time, but there are a number of critical errors stemming from mismatched and misplaced quotes. Quickly scanning I notice a missing colon in an inline CSS style, a missing double quote in a form declaration, and in lines 1876 and 1904 there are many instances of single quotes within option values and misplaced single quotes between option tags.

It's one of those things where the browser may be able to guess the intent and let it get away with a lot of that and still be technically functional, but if the XProc proxy is parsing it as XML, it's unlikely going to be able to predictably target whatever it's trying to get a hold of to render back.