dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 10 forks source link

Better indexing for search engines #752

Open satra opened 2 years ago

satra commented 2 years ago

At present the web ui is not well indexed by google. i've submitted a request. other options are:

  1. to prerender: i can enable prerendering on netlify or we can run this https://prerender.io/ for more custom capability
  2. submit a sitemap separately. i'm thinking of generating this automatically using the datalad job for dandiarchive.org

also we need to inject the dandiset metadata into the html of the dandiset landing page so that google dataset search can pick it up.

waxlamp commented 2 years ago

(Huh, I thought I had filed an issue about this 🐱)

I know very little about this topic. Netlify or external prerendering seems like a fit for our app. A dynamic sitemap that indexes the DLPs was the original idea I was going to run down. I don't know how to evaluate the two types of approach, but perhaps @brianhelba knows something about it. I believe others at Kitware have also done this type of thing before.

brianhelba commented 2 years ago

I don't have any experience with SEO for SPAs, sorry.

waxlamp commented 2 years ago

There's some seemingly good info in this article: https://madewithvuejs.com/blog/how-to-make-vue-js-single-page-applications-seo-friendly-a-beginner-s-guide

Some tools to consider:

yarikoptic commented 1 year ago

I would like to bring back interest to this aspect of making dandiarchive "indexed" by google and its dataset search. I think it would also be valuable so we could recommend that (schema.org / google dataset description record) as a way for other indexers (eg https://www.re3data.org/) to automate updating metadata about dandi -- now it is stone age "adjust stats in the text form" kinda approach.

May be we could even start with not necessarily full listing of dandisets but just providing overall record with stats (which we gather /request already), e.g. following example in https://developers.google.com/search/docs/appearance/structured-data/dataset but

satra commented 1 year ago

all that needs to happen is to insert a markup when rendering the DLP. see related issue here: #784

the fact that we are using jsonld for our metadata makes this very easy to inject our metadata into the DLP (the instructions are at the same place as the google link that yarik shared above).

the api side doesn't have any interaction with google dataset search or google search as far as i know. i'm sure they mine it.

satra commented 1 year ago

all we need to do is stick our dandiset metadata, which is jsonld, into our DLP generator code using script tags in head section.

yarikoptic commented 9 months ago

yeap,... let's just have it done. FWIW -- here is the location where openneuro injects such a record: https://github.com/OpenNeuroOrg/openneuro/blob/ba6297fd6061e9038ba199627e06b9abe951bef1/packages/openneuro-app/src/scripts/dataset/snapshot-container.tsx#L100 .

What actually do we need to do to our dandiset metadata record to become a "proper" for google's dataset discovery... would it consume properly our @context? should we add at least @type?

I naively took a sample metadata record and added into the <head> of a sample html I posted on https://neuro.debian.net/_files/testld.html and pointed https://search.google.com/test/rich-results/result?id=-9qISQt3fuenNtdosFJ3rA to it but it said that there is no reach metadat.

looking at it in jsonld playgroud ![image](https://github.com/dandi/dandi-archive/assets/39889/c508ef26-2047-4f11-9019-d71ca8c3d0c7)

it expands to "@type": "http://schema.dandiarchive.org/Dandiset", so not some schema.org's Dataset, so may be that is why it refuses? (unlikely I guess).

FWIW: for a sample openneuro dataset it does load up structured data nicely ![image](https://github.com/dandi/dandi-archive/assets/39889/f60b81ae-aaa6-40bc-8346-1c800647797a)
satra commented 9 months ago

the type is a key difference. but there are others i think. also i don't know if google supports json-ld 1.1. it didn't seem to want to expand type in their test setup. but there should be some translation possible. we may need to start with the most essential fields @id, @type, description and then add the rest.

bendichter commented 4 months ago

@alessandratrapani and I just tried googling "Recordings from medial entorhinal cortex during linear track and open exploration dandi". This is the title of an old dandiset + "dandi". Nothing came up regarding the DANDI archive. Could we put this on the roadmap? Are there any blockers?

satra commented 4 months ago

google does index dandiarchive. see this search: recordings site:dandiarchive.org i know this is not helpful to a user, but it demonstrates that dandi is indexed in google. the question of whether it's the most relevant response would fall in the seo optimization process, and perhaps a sitemap would help. since it already indexes it, and given the things on our plate, i would rather focus on our other features unless there are low hanging changes we can make.

bendichter commented 4 months ago

Confirmed on my end:

image

And it makes sense that there are higher priorities right now. It would be great if anyone who knows about SEO could chime in if there are any low hanging fruit here.

kabilar commented 4 months ago

Thanks for the report, Ben. This feature would certainly improve the user experience. I will add this to the backlog so that we can tackle it after our ongoing initiatives.

Interestingly, this KnowledgeSpace query is the first result returned by Google for Recordings from medial entorhinal cortex during linear track and open exploration dandi. Curious as to what they are doing differently. Albeit, this Dandiset is at the bottom of the KnowledgeSpace query page.