VizierDB / vizier-scala

The Vizier kernel-free notebook programming environment
Other
34 stars 11 forks source link

Query support for provenance. #232

Open okennedy opened 1 year ago

okennedy commented 1 year ago

What pain point is this feature intended to address? Please describe. When describing the capabilities of Vizier to others, I frequently find myself uttering the phrase "we collect the information to answer that question, but don't have a UI through which the user can access it". I've created issues in response to several of these situations... but it's not realistic to expect that we'll be able to ship all of the provenance exploration features that have been requested in any reasonable timeframe. It would be more powerful to provide an API through which others can explore the provenance of a notebook to answer specific questions. For example:

  1. When did this dataset change (i.e., git bisect for workflows)
  2. When did this cell get added
  3. What's the difference between the dataset at version X and version Y

Describe the solution you'd like This ticket is the starting point of a discussion of a potential solution. A few ideas:

  1. SQL access to some view over the workflows of a particular project. (Pro: easy to use; Con: capability ramp, SQL is awkward with iterative computations and it would be hard to specify some types of constraints programatically
  2. A Vizier 'shell' that you can drop into to explore the notebook and its history. The interface would be something like MutableProject without the mutability, but with the ability to set up iteration/binary search/etc... across branches or workflow versions... On that note, why limit ourselves to immutable versions?

Describe alternatives you've considered See above. We're not realistically going to provide UIs for this, so let's give users access.