Open pbuttigieg opened 1 year ago
Approach - build in this repo a GH action that - given a list of links to directories of JSON-LD files across the other repos in this org - will extract their raw links and add them to a sitemap hosted in this repo.
@kpitz Once we have that sitemap, we would register it in ODISCat and BeBOP will be plugged in to the ODIS system and more of the Decade's data exchange systems. After that, we should consider how to a) align the ODIS template to MIOP and b) have some similar metadata directories in the other repositories.
Here is script to generate a sitemap from JSON-LD files in a GitHub repo (apologies for my PHP use but I had initially created this back in 2011) : https://github.com/iodepo/odis-arch/blob/master/collection/scripts/generate_sitemap.php
Example sitemap.xml
output for that TechOceanS repo is: https://github.com/iodepo/odis-arch/blob/master/collection/tempHosting/data-TechOceanS/sitemap.xml
(A few ODIS partners are using that script) Let me know if it is preferred as a Python script instead.
@pbuttigieg @jmckenna We have a much simpler route to all this via the Git API...
If you try
curl https://api.github.com/repos/iodepo/odis-arch/contents
or
curl https://api.github.com/repos/BeBOP-OBON/TechOceanS_protocol_collection/contents/odis_metadata?ref=main
You will get back the GitHub API JSON response.
Then using jq (if you don't have jq, you should get it.. find it at https://stedolan.github.io/jq/ It is a must have in the toolbox)
curl https://api.github.com/repos/BeBOP-OBON/TechOceanS_protocol_collection/contents/odis_metadata?ref=main | jq '.[] | .download_url'
Will get the raw URLs for all the files in that directory with one simple command. From here it is trivial to build out the sitemap XML.
All this is also simple to work up as a GitHub action then.
So we should be able to do all this with like a small Bash script or simple Python program.
Thanks @jmckenna @fils
@fils I like the API approach - very lean. Can we set that up as a GH action in this repo, that scans all repos in the BeBOP Org and outputs an ubersitemap for OIH harvesting?
That would give us a single target for ODISCat
@pbuttigieg should easily support that. Note that the API takes a direct source directory as an argument, though you could build a crawler. However, easiest would be to simply know the target URLs for the directories and feed that into a program.
Then making a Github action would not be too hard. @jmckenna I'm happy to let you leverage the approaches you have done already combined with the API calls to generate the sitemaps if you wish.
However, easiest would be to simply know the target URLs for the directories and feed that into a program.
I think we can just scan all repos in this Org for ODIS metadata directories, set up just like the TechOceanS exemplar
@jmckenna @fils is this now automated?
xref https://github.com/iodepo/odis-arch/issues/145
@fils this would be the place to set up an organisation-wide sitemap to harvest ODIS-Arch JSON-LD.
TechOceanS is leading the way and has some JSON-LD that should fit our Documents pattern. The target directory is here: https://github.com/BeBOP-OBON/TechOceanS_protocol_collection/tree/main/odis_metadata
I would assume other repos and orgs would have a similar approach, if they host the JSON-LD in GitHub. If they don't, the sitemap should let us know anyway.