clowder-framework / clowder

A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.
https://clowderframework.org/
University of Illinois/NCSA Open Source License
35 stars 17 forks source link

Insert metadata in (dataset) pages, for (google)datasetsearch #335

Open MBcode opened 2 years ago

MBcode commented 2 years ago

Is your feature request related to a problem? Please describe. I would like to find clowder datasets in https://datasetsearch.research.google.com/ and via other aggregators

Describe the solution you'd like Insert json-ld into at least the dataset pages, and then an associated entry in the sitemap, so it can be crawled

Describe alternatives you've considered Given the possibly very large numbers, we need to give just enough metadata to get in the right area, and then be able to follow the Linked-Data follow your nose pattern. This might mean allowing for (api) calls to get the file/etc metadata as needed.

Additional context This will be done in stages, starting with the mapping of the dataset and file classes, which is in a draft-PR

MBcode commented 2 years ago

sitemap will come later, as well as some of the other possible metadata Right now just starting w/two classes: File and Dataset, to product jsonld describing their attributes mapped to schema.org vocabulary terms In the end the two *.scala.html outputs can be taken from the output html and pasted into: https://validator.schema.org to that once the sitemap is there, all of those elements from the linked Datasets could end up in https://datasetsearch.research.google.com/

MBcode commented 2 years ago

Dataset Files and the classes they hold instances of, all have to_jsonld methods, that get kicked off in the view; only other change was to signature of Utils.baseURL

MBcode commented 2 years ago

After this is accepted, the next step is issue #351 to get the sitemap so the datasets can get crawled

MBcode commented 2 years ago

pr comments have been closed for awhile, also not much sitemap feedback, so will look at other issues too