Open fonsp opened 8 months ago
I am quite unfamiliar with the code base, so maybe a that's long shot here..
Remembering the original order in which cells appear in the file means that this TopologicalOrder
should be stored in the Notebook
struct. That looks invasive, since a new struct field must be added.
I would personally strive for a solution to keep a single TopologicalOrder
around.
Can we make this choice directly during load_notebook
?
So, trust the script *.jl cell orders to directly get the TopologicalOrder
(e.g. in _cached_topological_order
), without doing static_resolve_topology
.
A version of static_resolve_topology
will serve later to incrementally update the TopologicalOrder
if we don't have the means to evaluate code, but the notebook is modified.
I am also troubled a bit with
checking if we have enough information to fully derive the order.
How would you conduct this check ? Is Notebook.process_status
reliable for such a thing ?
I'm poking at this a bit and also generally unfamiliar.
My naive thinking is:
Since this seems simple enough, I feel like I must be missing something?
You mean storing the order as _cached_topological_order
, right? That might work! Adding a new field would be cleaner though, the idea is that _cached_topological_order
does not store any information on its own, it just caches the output of topological_order(nb.topology)
.
But feel free to try! Writing the tests without implementation would already be super helpful.
Is this understanding correct?
So, is there any fundamental reason why this could not be done easily?
Yes that sounds correct and like a good idea!
One thing to note is that generally, saving does not cause calculations, but calculations (like a new topology) are caused by code execution (and obviously, code edits). For example, when you exit safe mode, we execute all cells, which means that more information about the toplogy becomes available (like an executed using
or macro definitions) and we recalculate the toplogy, and save again.
Feel free to start a draft PR! I think your approach sounds good. I'm curious to see what it looks like when the notebook.topology
is used to store the "imported topology". I can imagine that a new field eg notebook.imported_topology
will be cleaner but that would be a small tweak :) Thanks for taking the time to look into this!
Is there an example notebook that has an issue? Because in safe preview it seems like it's ordering cells appropriately in simple examples.
We recently launched Safe preview (#2563, take a look), but it has one missing feature!
When you open a notebook without running it, and you make a small change (eg fix a typo in a cell), Pluto will save the file, but the order of cells in the file can be drastically different, and you get a big git diff. Try this with one of your more complex notebooks to see what I mean, and take a look at which cells get moved and why.
The cause
Pluto saves cells in the
.jl
file in an order that allows the cells to be executed as a standalone script. It essentially takes the visual cell order, and then we do a stable sort based on cell dependencies.The reason why the issue happens is that Pluto often needs to evaluate your code to fully understand cell dependencies. The main examples are
using Plots
defines the symbolscatter
@gensym x
looks like it referencesx
, but it actually definesx
You cannot derive this info statically, so we run code.
When you open a notebook without running it, we only have information that can be derived statically. When we save your notebook, we do the best we can using that information, but this can lead to a very different file order than what was previously derived (based on complete information), leading to a large diff.
Solution idea
My idea for solving this is: when we load a notebook file, also remember the original order in which cells appear in the file. Whenever you save a notebook, check if we have enough information to fully derive the order. If so, remove the memorized order and do as usual. If not, try to stick to the original file order as much as possible (i.e. use the original file order as a way to disambiguate the unknown cell deps).
Relevant code is here:
Pluto's reactivity algorithm. The core is this function that sorts a list of cells into search order https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/src/analysis/topological_order.jl#L10-L137
The output of this function is: https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/src/analysis/TopologicalOrder.jl
One of the inputs of this function is: https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/src/analysis/Topology.jl#L28-L36 which is a snapshot of the cells, and their symbol definitions etc that are known at one point in time.
The Notebook struct: https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/src/notebook/Notebook.jl#L24-L65
Reading and writing notebook files: https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/src/notebook/saving%20and%20loading.jl
For tests, search for the word "order" in this file: https://github.com/fonsp/Pluto.jl/blob/v0.19.29-src/test/React.jl Also search for only_versions_or_lineorder_differ in the source code.