Open andrewsu opened 3 years ago
I've tried the older tool to test if the metadata is not being loaded https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fdiscovery.biothings.io%2Fdataset%2Fda4905854c18028d This seems to work so I'm wondering if the newer testing tool doesn't wait for dynamically embedded content..
Interesting, I was under the impression that Rich Results Tool would work better with dynamic content. Regardless, it was my understanding that Rich Results Tool better approximates what their crawler does. (Seemed to be confirmed by this blog post: https://webmasters.googleblog.com/2020/07/rich-results-test-out-of-beta.html...)
"It handles dynamically loaded structured data markup more effectively" hmm, I'll do more research on this there's gotta be a similar case somewhere.
@marcodarko any update on this ticket? I wonder for example, if you put the rendered HTML in a static file, would Rich Results properly parse it? That might tell us whether it's something in our JSON-LD versus the dynamic loading...
@andrewsu yes, so one possibility is that the page is requesting a lot of resources and this tool doesn't wait for them so you get the Partially loaded error and ultimately the tool not detecting any metadata. I'm gonna revisit this page and offload as much stuff as possible and make it as light as possible. Google does have a URL inspection tool and a PageSpeed insights tool so I'm gonna monitor the output and try again. I don't think our site is bloated with resources but it's worth a shot and from reading similar issues seems to be the issue.
Just an update, not necessarily good news:
Hmm so I made the page as light as possible and removed all Loading Issues we had previously and improved the page performance and I still get the same error.
One thing I found that was weird is that this https://discovery.biothings.io/dataset/83dc3401f86819de dataset works for rich results but none of the others work. I compared the outbreak ones to this one and I noticed the @type for Wellderly had no prefix, so I did a test and removed the prefix from all and tested again but that was not the issue. It would be weird for that to be the problem but I wanted to try it anyway.
So basically, still looking into why these are not showing as having rich results. Now that the page is clear I hope I can find the reason faster...
@andrewsu @newgene Ok, so turns out the rich results tool doesn't like:
prefixes on @types
anything else other than: "@context": "http://schema.org/"
I think that means that maybe I have to modify both the @type and @context ? obviously that wouldn't be 100% true to it's origins so that's the con.
I compared the same metadata above and that confirms the test I did. I'll also include the screenshots below: Passing Wellderly dataset
Failing Outbreak dataset
Passing Modified Outbreak Dataset
I used the same tool but tested pasting the script tag containing the json-ld and selecting the 'code' option instead of url in order to be able to modify and re run.
To confirm that it expects the context to be schema.org:
After fixing all loading issues and making all datasets appear to be schema:Dataset derived, we have submitted the dataset sitemap for indexing on Oct 28th, and was marked as successful with 44 discovered urls on the same day.
However, it appears that those 44 had already been indexed before so they don't show up as being indexed recently (Oct 28), so no changes detected may have resulted in the being skipped??
I'll just note that when I created this ticket, Dataset Search had one dataset, yesterday it had three, and today it has four. So trending in the right direction... 🤞 https://datasetsearch.research.google.com/search?query=site%3Adiscovery.biothings.io
... and back down to two indexed datasets... :(
Overall, we're still stuck at ~4 indexed datasets~ (EDIT 2021-01-19) 10 indexed datasets (https://datasetsearch.research.google.com/search?query=site%3Adiscovery.biothings.io) out of 58 currently available. Just a deep dive on one particular dataset (https://discovery.biothings.io/dataset/da4905854c18028d), the one mentioned in the first comment in this issue.
Observation 1: this dataset is not indexed in Google Dataset Search (https://datasetsearch.research.google.com/search?query=site%3Adiscovery.biothings.io&docid=s9UmBdKO4VvS4gDUAAAAAA%3D%3D)
Observation 2: this URL is successfully crawled by the Google crawler
Observation 3: The rich results tool has a problem parsing our structured metadata, likely due to the dynamic injection of metadata? See the vuex.js
loading error in the screenshot below, and note that's different than the schema-related issues that @marcodarko posted above. Bottom line, I think something is still going on with google handling our dynamic content...
Observation 4: Plugging the dynamically-rendered HTML into the Rich Results tool led to two warnings and one error. Focusing on the error, I don't quite get the error message. It says the citation
shouldn't be of type ScholarlyArticle
, but https://schema.org/citation says it can be either a CreativeWork
or Text
, and https://schema.org/ScholarlyArticle is clearly a subclass of CreativeWork
.
Small update: Updated that page and removed all libraries that could be replaced with vanilla js eg. Vuex and Axios. that removes the loading issues but still get the other warnings and error for @types on Observation 4.
It appears that of the 39 datasets currently at https://discovery.biothings.io/dataset, Google Dataset Search is only indexing the Wellderly dataset: https://datasetsearch.research.google.com/search?query=site%3Adiscovery.biothings.io
Not sure if it's the only reason, but rich results test on https://discovery.biothings.io/dataset/da4905854c18028d gives a "Page partially loaded" error...