ReviewNB / support

Issues and feature requests for ReviewNB
https://reviewnb.com
59 stars 8 forks source link

Databricks exported notebooks are shown as all different while only one line and an image changed #102

Closed edmondop closed 2 years ago

edmondop commented 2 years ago

See

https://github.com/edmondo1984/databrick-notebooks-/pull/1

the only thing that changed is sin to cos.

amit1rrr commented 2 years ago

As you can see in the GitHub diff Databricks formats the notebook as an entire JSON blob on a single line. I.e. unlike standard Jupyter editor, there are no newline characters present in the notebook created / edited by Databricks. As a result, git cannot detect changes made in the notebook file. Git patch simply says that this single long line was edited (even though only one word was changed in your example). Our diff'ing algorithm is based on git diff / patch and since those are not generated properly with Databricks, ReviewNB does not work well with Databricks notebooks.

There's no easy workaround for us in this situation. Maybe asking Databricks to store their notebooks in standard Jupyter format might help or you could simply use the standard Jupyter / JupyterLab client.

edmondop commented 2 years ago

After breaking down the notebook in three cells, it looks like there's a problem in the diff. ReviewNB can't show side to side the first cell of notebook A from branch main and the first cell of notebook A from branch B. Can you explain why would this come from the notebook export format? It looks to me like a parser problem

amit1rrr commented 2 years ago

ReviewNB can't show side to side the first cell of notebook A from branch main and the first cell of notebook A from branch B.

Are these notebooks created / edited with standard Jupyter / JupyterLab? Please share the URL so we can take a look.

amit1rrr commented 2 years ago

Closing due to lack of response. Please feel free to reopen anytime.

edmondop commented 2 years ago

Apologies @amit1rrr . You can try with nbdime, nbdime is correctly able to show the diff. That's why I believe it's a problem with ReviewNB

amit1rrr commented 2 years ago

That's right. nbdime is able to take any two valid notebook files and able to generate diff between them (which is great!). Whereas ReviewNB diff only works for notebooks created with classic Jupyter / JupyterLab client (or any other client that produces notebook files in the exact same format as these). So I will accept that this is a limitation by design when compared to nbdime.

P.S. - This limitation is mainly because ReviewNB's diff algorithm relies on the git patch which is not properly available if the entire notebook JSON is on a single line (like the notebook produced by Databricks).