earthref / MagIC

EarthRef's MagIC Web Application
https://earthref.org/MagIC
MIT License
8 stars 2 forks source link

harvesting issue #575

Closed valentinedwv closed 4 weeks ago

valentinedwv commented 1 year ago

Did something change? Earthcube Geocodes harvested on April 24th (to a dev instance), but is having issues today.

The JSON-LD is not being found. I can see it in chrome devtools, where the page redirects to https://www2.earthref.org/MagIC/search

But for the Gleaner, we are notge seeing anything,

nor is validate.schema.org

njarboe commented 1 year ago

Hi Dave. I don't think anything should have changed. Is it still not working? I have informed Rupert about this issue and he might be able to give more insight on the problem.

valentinedwv commented 1 year ago

Don't think validate.schema.org had issues before. Have a link for https://dx.doi.org/10.7288/V4/MAGIC/15607 in our sources notes sheet that now no longer works https://validator.schema.org/#url=https%3A%2F%2Fdx.doi.org%2F10.7288%2FV4%2FMAGIC%2F15607

rminnett commented 1 year ago

Hi Dave,

I can see the <script id="schemaorg" ... header element in the source of https://dx.doi.org/10.7288/V4/MAGIC/15607 after it's finished loading. I don't think anything has changed in the way the page loads, but I'm not sure that the validor.schema.org will work if it's not waiting for the JSON-LD to be lazy-loaded.

The Google structured data test appears to be able to read the data, though: https://search.google.com/test/rich-results?url=https%3A%2F%2Fdx.doi.org%2F10.7288%2FV4%2FMAGIC%2F15607

I remember that Doug Fils had to write some extra code to support the lazy-loading content on our pages when he was working on P418 - do you know if that's still being used in Gleaner?

Rupert

valentinedwv commented 1 year ago

Using Gleaner, aka Dougs tool... Been a couple changes to wait for full rendering, but Gleaner and validator both appeared to working a couple weeks ago. Will just table it until we have more bandwidth to dig

valentinedwv commented 1 year ago

DIfferent question:

rminnett commented 1 year ago

Hi Dave,

There were a few commits deployed on April 26th and 27th related to retrieving reference metadata from DataCite as well as Crossref when contributions to MagIC are being made, but that shouldn't affect the JSON-LD headers or the way the pages are rendered.

Sorry, we don't have a better answer - if it's too much trouble to solve this in the headless rendering during harvesting, we can look into rendering the JSON-LD in the initial HTML response instead of the lazy-loaded content.

Rupert

valentinedwv commented 1 year ago

Rerunning the harvest, and no errors in the first three hours using gleaner.

Still funky that it's not being seen in validator. Filed an issue, there.