Open iannesbitt opened 1 year ago
Also potentially useful for testing #35
Example server tree:
├── metadata
│ ├── CANWIN.jsonld
│ ├── HAKAI_IYS.jsonld
│ ├── HD-301-response.jsonld
│ └── HD-redirect.jsonld
├── robots.txt
└── sitemap.xml
Content negotiation using Django REST framework: https://www.django-rest-framework.org/api-guide/content-negotiation/
Removing label as this is not necessarily related to a version.
Related to:
21
23
After working with this software for a while, I'm becoming aware that there are many valid site configurations out there that we are unable to navigate due to the limitations of the spider and harvesting system.
Given the above planned features for the spider, it would improve code testing significantly to set up a simple web server with a
robots.txt
andsitemap.xml
at the base that delivers content in some of the ways commonly used by data repositories. For example, being able to test the navigation of javascript elements that render JSON-LD content after the page is loaded (i.e. MagIC DataONEorg/member-repos#16), anapplication/ld+json
delivery system (i.e. Harvard Dataverse DataONEorg/member-repos#52, some valid but alternative configurations of schema.org data (i.e. CanWIN DataONEorg/member-repos#67) and perhaps some misconfigured robots.txt scenarios (i.e. Borealis DataONEorg/member-repos#51), without needing to crawl the repositories themselves.