Open prise6 opened 1 year ago
Hi @prise6, I think I use the same strategy as you, but have some minor difference: I try to keep writing code in my IDE also for the notebook. As it is enough to just hit the save button, the code is then automatically updated on the remote. Then I run it and check it interactively. From time to time, it happens, that I find myself in coding some quick changes in the notebook in Databricks. The easiest thing to do for me is to copy the changes to my local.
When I am done with developing the feature, I use a job to execute it instead of pulling the changes just to the same folder (I do this currently still manually, but hope to change it in the future). However, I think that discarding the changes in the Databricks Repo could be the way to go as you describe it.
I follow the same development process as @alexeyegorov. This has worked pretty well for me. Further, I typically use a repo in Databricks that is not linked to a remote git repository, as I am syncing from my local copy anyways. You can set this up by unchecking "Create repo by cloning a Git repository" in the "Add Repo" dialog. You could have a second repo that is linked to a git repo that you can pull all the changes from the repo for testing.
If you do want to continue developing with only a single repo but without the conflict troubles, there is another option you could consider, but it takes a little more setup. In the dbx sync reference you'll see some details about using dbx sync dbfs
. You can add a cell like the one below near the top of your notebook before you import other modules. This will cause it to import your code from the DBFS path instead of the repo, as the DBFS path appears first in the sys.path
list. The Python modules you've edited will only sync to DBFS and not the repo, so you won't run into conflicts. This was the dev process I first used when I first developed the sync command, but these days I find it easiest to just sync to the unlinked repo witih dbx sync repo
as described above.
import sys
if "/dbfs/tmp/users/first.last/myrepo" not in sys.path:
sys.path.insert(0, "/dbfs/tmp/users/first.last/myrepo")
Hello,
As documented in "mixed-mode dev loop for python project" (official doc):
Simple tree of project:
dbx sync
to update this package and to make it available to the remote Repodbx sync
A this final 6th step, here is my issue so to speak: i've got conflicts because of
dbx sync
which have already updated the target files (that's normal).My workaround is to discard changes before pulling the changes. BUT, i want to keep notebook i updated in the databricks workspace. So i discard file by file all files except notebooks.
My questions:
One solution is to commit first the notebook and then discard changes.