For reasons of performance/sanity, we've been forced to make a number of compromises on reproducibility. For example, LOAD DATASET can read from a URL or from the local filesystem (and that's generally desirable!). However, if the underlying file changes... the resulting dataset/workflow becomes stale.
We should be able to communicate this situation to users. The proposed approach is to define a new cell state (name TBD, tentative: INCONSISTENT) that indicates that the cell is valid w.r.t. the current workflow, but stale w.r.t., the outside world. Planned implementations include:
[ ] LOAD DATASET cells reading from the local FS can use timestamps and hashes to flag when the underlying file has changed
[ ] PYTHON cells should flag when the python environment being used has changed destructively (new package version, deleted package)
For reasons of performance/sanity, we've been forced to make a number of compromises on reproducibility. For example, LOAD DATASET can read from a URL or from the local filesystem (and that's generally desirable!). However, if the underlying file changes... the resulting dataset/workflow becomes stale.
We should be able to communicate this situation to users. The proposed approach is to define a new cell state (name TBD, tentative:
INCONSISTENT
) that indicates that the cell is valid w.r.t. the current workflow, but stale w.r.t., the outside world. Planned implementations include:LOAD DATASET
cells reading from the local FS can use timestamps and hashes to flag when the underlying file has changedPYTHON
cells should flag when the python environment being used has changed destructively (new package version, deleted package)