bskinn / sphobjinv

Toolkit for manipulation and inspection of Sphinx objects.inv files
https://sphobjinv.readthedocs.io
MIT License
78 stars 9 forks source link

idea: sphinx objects inventory shed #148

Open cblegare opened 3 years ago

cblegare commented 3 years ago

Hi there!

First, thank you a lot for sphobjinv! It has proven most useful in countless situations for me.

Since you have been playing with inventories alot, I would like your feedback on an idea: a sphinx object inventory shed.

I am looking for ways to help colleagues write their technical documentation, especially in complex multi platforms environments (think cloud microservices). I find Sphinx is the best compromise for independent projects in multiple domains linking each others, building an hypermedia documentation. I also think that MkDocs is getting attraction because of its simplicity and because people love markdown. The major downside here in my opinion is the lack of inter-project reference engine (like intersphinx), especially since Sphinx users now have MyST for markdown support.

To help solve this, inspired by sphobjinv and typeshed, I am thinking of a Sphinx objects inventory shed (or soished for short)

This project could provide a few things:

My commitment into this would be quite modest, but I feel that within a few weeks I could make something up to get started.

Do you ( @bskinn ), or anyone reading this, have any opinion about that idea?

bskinn commented 3 years ago

Awesome! It's always great to hear that a project has been useful to someone other than just myself. :-)


High-level, I think the objective behind this (making it much easier for Sphinx documentarians to cross-reference into third-party, non-Sphinx document(ation) sets) is a really great idea -- Sphinx's robust cross-referencing functionality is really nice, and the recent emergence of MyST is a tremendous addition to the Sphinx ecosystem. A system that provides robust, searchable, discoverable objects.inv for documentation/documents not written in Sphinx seems like it might be hugely valuable.

I have a number of thoughts percolating on the idea...I'll post further here once I've pulled them together. I think one of the key questions is whether it actually makes sense to try to implement it as a centralized repository, as opposed to a set of advanced tooling for auto-creation of objects.invs from webscraped data (part of this had already occurred to me at a high level; see #19), which each Sphinx documentarian would configure and use themselves... freshness of the data in the objects.invs seems likely to be a significant issue, as well as the maintenance workload of the configuration of the webscraper spiders themselves.

cblegare commented 3 years ago

Thank you for this feedback. I am myself still working my mind around this. I feel that some hindrances to this objectives could be lifted with a few changes in sphinx.ext.intersphinx itself (like https://github.com/sphinx-doc/sphinx/issues/5562 as an example)

bskinn commented 3 years ago

Sorry for the slow reply on this!

I've been mulling this idea over, and aside from implementing a robust and simple way for people to set up suitable web scraping (I still like the name soiscraper for such a project...), I think the biggest challenge is finding a good way to manage freshness/staleness of the objects.inv files that are made available.

For Sphinx docsets, e.g. on ReadTheDocs, there's a guarantee of freshness in the objects.inv that lives with the docset. Every time the docs are built, a fresh objects.inv is produced, and so a user can be sure that the objects.inv they find with the docset is "fresh" ... it definitely contains an accurate representation of the documented artifacts and where they live in the HTML directory tree.

For an objects.inv that's created by the still-hypothetical soiscraper and hosted in a central shed, though, there are two flavors of "staleness" that can develop for it:

  1. Changes could have been made to the documentation set associated with the .inv, and so the soished's .inv is stale with respect to the actual documentation up on the web.
  2. If a documentarian downloaded that objects.inv some time ago (two days, two weeks, two months, ...), then their local copy may be stale with respect to the objects.inv that's currently hosted at soished.

If clean and sufficiently inexpensive ways can be figured out to manage these two stalenesses, then I think the idea is probably solid.

In terms of a soished implementation, Item 1 is the bigger deal; because if someone just sets their conf.py up to point at the soished objects.inv, then they'll get a fresh inventory whenever they build after make clean, or using make -e, and that ~takes care of Item 2.

There are also cloud services cost aspects for both of these items:

CDN caching would probably make sense for Item 2.

Some sort of intelligent microscraping of the website hierarchy under documentation, built into soiscraper or soished or both, that provides a guess as to whether it's been updated since the last scrape, might help reduce cloud billing beyond simply setting a conservative re-scrape interval (whether fixed or customizable per target docset). I don't have much experience with cloud, though, so it could be that the web scraping is low-traffic enough that it makes the most sense to just fully rescrape and regenerate on, say, a six-hour interval.