Open cblegare opened 3 years ago
Awesome! It's always great to hear that a project has been useful to someone other than just myself. :-)
High-level, I think the objective behind this (making it much easier for Sphinx documentarians to cross-reference into third-party, non-Sphinx document(ation) sets) is a really great idea -- Sphinx's robust cross-referencing functionality is really nice, and the recent emergence of MyST is a tremendous addition to the Sphinx ecosystem. A system that provides robust, searchable, discoverable objects.inv
for documentation/documents not written in Sphinx seems like it might be hugely valuable.
I have a number of thoughts percolating on the idea...I'll post further here once I've pulled them together. I think one of the key questions is whether it actually makes sense to try to implement it as a centralized repository, as opposed to a set of advanced tooling for auto-creation of objects.inv
s from webscraped data (part of this had already occurred to me at a high level; see #19), which each Sphinx documentarian would configure and use themselves... freshness of the data in the objects.inv
s seems likely to be a significant issue, as well as the maintenance workload of the configuration of the webscraper spiders themselves.
Thank you for this feedback. I am myself still working my mind around this. I feel that some hindrances to this objectives could be lifted with a few changes in sphinx.ext.intersphinx
itself (like https://github.com/sphinx-doc/sphinx/issues/5562 as an example)
Sorry for the slow reply on this!
I've been mulling this idea over, and aside from implementing a robust and simple way for people to set up suitable web scraping (I still like the name soiscraper
for such a project...), I think the biggest challenge is finding a good way to manage freshness/staleness of the objects.inv
files that are made available.
For Sphinx docsets, e.g. on ReadTheDocs, there's a guarantee of freshness in the objects.inv
that lives with the docset. Every time the docs are built, a fresh objects.inv
is produced, and so a user can be sure that the objects.inv
they find with the docset is "fresh" ... it definitely contains an accurate representation of the documented artifacts and where they live in the HTML directory tree.
For an objects.inv
that's created by the still-hypothetical soiscraper
and hosted in a central shed, though, there are two flavors of "staleness" that can develop for it:
.inv
, and so the soished's .inv
is stale with respect to the actual documentation up on the web.objects.inv
some time ago (two days, two weeks, two months, ...), then their local copy may be stale with respect to the objects.inv
that's currently hosted at soished.If clean and sufficiently inexpensive ways can be figured out to manage these two stalenesses, then I think the idea is probably solid.
In terms of a soished
implementation, Item 1 is the bigger deal; because if someone just sets their conf.py
up to point at the soished objects.inv
, then they'll get a fresh inventory whenever they build after make clean
, or using make -e
, and that ~takes care of Item 2.
There are also cloud services cost aspects for both of these items:
objects.inv
files are rebuilt on a frequent basis;make clean
or with the -e
flag, the remote objects.inv
files get downloaded again.CDN caching would probably make sense for Item 2.
Some sort of intelligent microscraping of the website hierarchy under documentation, built into soiscraper
or soished
or both, that provides a guess as to whether it's been updated since the last scrape, might help reduce cloud billing beyond simply setting a conservative re-scrape interval (whether fixed or customizable per target docset). I don't have much experience with cloud, though, so it could be that the web scraping is low-traffic enough that it makes the most sense to just fully rescrape and regenerate on, say, a six-hour interval.
Hi there!
First, thank you a lot for
sphobjinv
! It has proven most useful in countless situations for me.Since you have been playing with inventories alot, I would like your feedback on an idea: a sphinx object inventory shed.
I am looking for ways to help colleagues write their technical documentation, especially in complex multi platforms environments (think cloud microservices). I find Sphinx is the best compromise for independent projects in multiple domains linking each others, building an hypermedia documentation. I also think that MkDocs is getting attraction because of its simplicity and because people love markdown. The major downside here in my opinion is the lack of inter-project reference engine (like intersphinx), especially since Sphinx users now have MyST for markdown support.
To help solve this, inspired by sphobjinv and typeshed, I am thinking of a Sphinx objects inventory shed (or
soished
for short)This project could provide a few things:
sphobjinv
to help generateobjects.inv
filesobjects.inv
files for major projects lackingobjects.inv
(like Javascript primitives from Mozilla MDN)My commitment into this would be quite modest, but I feel that within a few weeks I could make something up to get started.
Do you ( @bskinn ), or anyone reading this, have any opinion about that idea?