deshaw / jupyterlab-execute-time

A JupyterLab extension for displaying cell timings
BSD 3-Clause "New" or "Revised" License
372 stars 48 forks source link

Use with git gives messed git diffs #130

Open grzegorz700 opened 1 month ago

grzegorz700 commented 1 month ago

When we use this extension with git-based sytems it produces 10(5*2) lines of diffs per every changed cell.


   "metadata": {
    "execution": {
  -   "iopub.execute_input": "2024-10-14T13:10:30.905308Z",
  -   "iopub.status.busy": "2024-10-14T13:10:30.904740Z",
  -   "iopub.status.idle": "2024-10-14T13:10:30.908169Z",
  -   "shell.execute_reply": "2024-10-14T13:10:30.907722Z",
  -   "shell.execute_reply.started": "2024-10-14T13:10:30.905290Z"
  +   "iopub.execute_input": "2024-10-14T19:16:26.414571Z",
  +   "iopub.status.busy": "2024-10-14T19:16:26.413960Z",
  +   "iopub.status.idle": "2024-10-14T19:16:26.417570Z",
  +   "shell.execute_reply": "2024-10-14T19:16:26.417137Z",
  +   "shell.execute_reply.started": "2024-10-14T19:16:26.414551Z"
    }
   },

This problem is well known without any perfect solution. Based on the many reference solutions, including the list from stackoverflow and a good advice https://github.com/jupyterlab/jupyterlab/issues/9444#issuecomment-743992307 and other stackoverflow solutions. I propose my partial workaround setup.

Partial workaround:

We could use this extension without massive diffs is based on two stages:

Prevent from pushing to git.

  1. Create or Edit a .gitattributes file in the root of your repository:
    touch .gitattributes
  2. Add the following line to the .gitattributes file:
    *.ipynb filter=clean_meta_ipynb
  3. Run:
    
    git config filter.clean_meta_ipynb.clean "jupyter nbconvert --to notebook --stdin --stdout --ClearMetadataPreprocessor.enabled=True"

or

git config filter.clean_meta_ipynb.clean "nbstripout-fast --keep-output --keep-count --textconv"

## Prevent from displaying diffs in jupyterlab-git:
1.  Check where are your nbtime configs (with file name `nbdime_config.json`):

jupyter --paths

2. Create or update your `nbdime_config.json` (e.g. `~/.jupyter/nbdime_config.json`)
3. Add the following lines to them:
```json
{
    "NbDiff": {
      "Ignore": {
        "/metadata": true,
        "/cells/*/metadata": true
      }
    },
    "Extension": {
      "Ignore": {
        "/metadata": true,
        "/cells/*/metadata": true
      }
    }
  }

or we could try with the more precise exclusion like: "/cells/*/metadata":['execution'].

  1. Restart jupyter lab

Drawbacks:

I put it that solution, especially for people who want to use this extension without the need to remove other info from notebooks (e.g. outputs).

However, I would love to see a better solution.

mlucool commented 1 month ago

This question is maybe better focused outside this plugin, but have you tried https://github.com/deshaw/nbstripout-fast? With this, nbdime does not show timestamps diff nor commit them.

grzegorz700 commented 1 month ago

Thank you for the library reference. nbstripout-fast is good for this purpose to speed it up, and remove one of the drawbacks of this partial solution. I'll update the post with the second quicker solution as well. It's hard to say/find the best place to track this type of problem. However, most discussions cover mostly removing outputs from cells, not metadata. This extension produces a substantial amount of frequently changing metadata, so I've decided to put this problem with my partial solution here.

Feel free to close this issue, if you want. I wrote my post to help others because I didn't find any good solution in other places/issues/stacks to this particular sub-problem. So now, I hope it'll be possible to find it easier, regardless of whether it is closed or not.

mlucool commented 1 month ago

this extension produces a substantial amount of frequently changing metadata

This extension produces no metadata actually - its just a renderer. For simplicity, we simply turn on an option to produce it in JupyterLab itself.