att / rcloud

Collaborative data analysis and visualization
http://rcloud.social
MIT License
429 stars 141 forks source link

include metadata in the stored notebook #2713

Open s-u opened 4 years ago

s-u commented 4 years ago

We should add a metadata asset in the notebooks. Currently, some information such as title, visibility and notebook type is only stored in RCS, but is inherently part of the notebook. For example, if a notebook is imported from another source, that information is lost.

The proposal is to add .meta.json asset with any metadata that is expected to be durable.

Consideration has to be take about the impact on history.

gordonwoodhull commented 4 years ago

What impact on history are you concerned about?

The main problem I see is that "old" notebooks won't have any metadata. So anything that relies on metadata will need to be able to deal with nulls.

Other than that, we just always add it when saving, if it's not there, right?

s-u commented 4 years ago

What I meant is that any changes to the metadata will trigger a commit, so there may be no visible change for the notebook (i.e., cells) in the history (e.g. if you do commit-based undos) which may be confusing.

gordonwoodhull commented 4 years ago

.meta.json could/should contain any attributes that are not part of the notebook but which are stored in RCS.

RCS notebook key documentation.

Yes:

Maybe:

No:

I'm not sure what you mean about notebook source - I guess you are referring to what instance it ran on. We can't write to a foreign notebook, and there isn't any way to know the name of the current instance as a foreign source for anyone else. It's never really "imported". If it's forked from a foreign source, that could perhaps be recorded.

gordonwoodhull commented 4 years ago

As usual, our lack of a coherent strategy for editing notebooks is going to make this more difficult than it should be.

Assume that the user has the new "Show hidden assets" option #2715 turned on. So .meta.json will be visible in the assets pane.

But now .meta.json will be changing in response to arbitrary commands: when the metadata is first added to a notebook, when its visibility or publish bit is changed, etc etc.

We don't have any way of handling this, currently, because we assume that all changes come directly from the UI. When the user adds a cell, we add it to the model and then save the model. We don't ever compare the gist content with the model and see if changes came back from the server.

It seems to me that we have little choice but to special-case this one file. After any save, we get the notebook content back and we can compare that one asset. This is better than instrumenting the notebook tree and whatever else to signal that it is doing something that changes the metadata, and it allows the server to change metadata. (Although I think the client will do this usually, since it has all the info on hand.)

I'm less concerned about the UI problem for the user. If they are concerned exactly what happened in a commit, they can turn on "Show hidden assets" and see it.