VizierDB / vizier-scala

The Vizier kernel-free notebook programming environment
Other
34 stars 11 forks source link

Stop dataset message hydration from blocking forward cell progress #231

Open okennedy opened 1 year ago

okennedy commented 1 year ago

What pain point is this feature intended to address? Please describe. Several cells, most notably the SQL cell and most Mimir lenses, are designed first to generate a dataset and then display it as a message. While the update is relatively cheap (it just defines a new dataframe constructor), generating the preview view of the dataset gets significantly more pricy. Unfortunately, the way dependencies are tracked, subsequent cells can't be executed until the expensive part finishes too. A notebook that could finish almost instantaneously can instead take several minutes.

Describe the solution you'd like Provide a way to signal to ExecutionContext (e.g., ExecutionContext.noMoreArtifacts()) that no further artifacts will be generated and that any subsequent dependencies are free to proceed. ExecutionContext can then crash the cell if it tries to output an artifact.

Describe alternatives you've considered One alternative option would be to hydrate dataset messages lazily. An unhydrated dataset message could always be populated by querying the database (this query usually happens anyway since the workflow generally loads more than just the preview rows, but users will see spinners). The challenge is actually doing the hydration, which requires database access and so not something that we should be doing lazily when the artifact is accessed. We might be able to spin up a background worker to manage hydration...