databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
https://dbx.readthedocs.io
Other
440 stars 120 forks source link

dbx sync fails after git rebase #685

Open alexeyegorov opened 1 year ago

alexeyegorov commented 1 year ago

Expected Behavior

I am using dbx sync during the development. It works through the whole development cycle. At the end or throughout the process, I perform a git rebase. The changes should also sync.

Current Behavior

During/after the rebase, DBX seems to run into the following error:


[dbx][2023-02-07 11:15:58.828] Putting /Repos/Alexey.Egorov@lampenwelt.de/lotus-ml/notebooks/image_similarity/autoencoder/DeployModel.py
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/commands/sync/sync.py:318 in │
│ repo                                                                                             │
│                                                                                                  │
│   315 │                                                                                          │
│   316 │   client = ReposClient(user=user_name, repo_name=dest_repo, config=config)               │
│   317 │                                                                                          │
│ ❱ 318 │   main_loop(                                                                             │
│   319 │   │   source=source,                                                                     │
│   320 │   │   matcher=matcher,                                                                   │
│   321 │   │   client=client,                                                                     │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/commands/sync/functions.py:1 │
│ 29 in main_loop                                                                                  │
│                                                                                                  │
│   126 │   # Run the incremental copy and record how many operations were performed or would ha   │
│   127 │   # performed (if in dry run mode).  An operation usually translates to an API call, s   │
│   128 │   # create a directory, put a file, etc.                                                 │
│ ❱ 129 │   op_count = syncer.incremental_copy()                                                   │
│   130 │                                                                                          │
│   131 │   if not op_count:                                                                       │
│   132 │   │   dbx_echo("No changes found during initial copy")                                   │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/__init__.py:449 in      │
│ incremental_copy                                                                                 │
│                                                                                                  │
│   446 │   │                                                                                      │
│   447 │   │   # Use the diff between current snapshot and previous snapshot to apply the same    │
│   448 │   │   # against the remote location.                                                     │
│ ❱ 449 │   │   op_count = asyncio.run(self._apply_snapshot_diff(diff))                            │
│   450 │   │                                                                                      │
│   451 │   │   self.last_snapshot = snapshot                                                      │
│   452                                                                                            │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/asyncio/runners.py:44 in run                   │
│                                                                                                  │
│   41 │   │   events.set_event_loop(loop)                                                         │
│   42 │   │   if debug is not None:                                                               │
│   43 │   │   │   loop.set_debug(debug)                                                           │
│ ❱ 44 │   │   return loop.run_until_complete(main)                                                │
│   45 │   finally:                                                                                │
│   46 │   │   try:                                                                                │
│   47 │   │   │   _cancel_all_tasks(loop)                                                         │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/asyncio/base_events.py:649 in                  │
│ run_until_complete                                                                               │
│                                                                                                  │
│    646 │   │   if not future.done():                                                             │
│    647 │   │   │   raise RuntimeError('Event loop stopped before Future completed.')             │
│    648 │   │                                                                                     │
│ ❱  649 │   │   return future.result()                                                            │
│    650 │                                                                                         │
│    651 │   def stop(self):                                                                       │
│    652 │   │   """Stop running the event loop.                                                   │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/__init__.py:243 in      │
│ _apply_snapshot_diff                                                                             │
│                                                                                                  │
│   240 │   │   │   op_count += await self._apply_dirs_created(diff, session)                      │
│   241 │   │   │   op_count += await self._apply_files_created(diff, session)                     │
│   242 │   │   │   op_count += await self._apply_files_deleted(diff, session, deleted_dirs)       │
│ ❱ 243 │   │   │   op_count += await self._apply_files_modified(diff, session)                    │
│   244 │   │                                                                                      │
│   245 │   │   return op_count                                                                    │
│   246                                                                                            │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/__init__.py:209 in      │
│ _apply_files_modified                                                                            │
│                                                                                                  │
│   206 │   │   return await self._apply_file_puts(session, diff.files_created, "created")         │
│   207 │                                                                                          │
│   208 │   async def _apply_files_modified(self, diff: SnapshotDiff, session: aiohttp.ClientSes   │
│ ❱ 209 │   │   return await self._apply_file_puts(session, diff.files_modified, "modified")       │
│   210 │                                                                                          │
│   211 │   async def _apply_files_deleted(                                                        │
│   212 │   │   self, diff: SnapshotDiff, session: aiohttp.ClientSession, deleted_dirs: List[str   │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/__init__.py:202 in      │
│ _apply_file_puts                                                                                 │
│                                                                                                  │
│   199 │   │   │   else:                                                                          │
│   200 │   │   │   │   dbx_echo(f"(noop) File {msg}: {path}")                                     │
│   201 │   │   if tasks:                                                                          │
│ ❱ 202 │   │   │   await asyncio.gather(*tasks)                                                   │
│   203 │   │   return op_count                                                                    │
│   204 │                                                                                          │
│   205 │   async def _apply_files_created(self, diff: SnapshotDiff, session: aiohttp.ClientSess   │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/__init__.py:196 in task │
│                                                                                                  │
│   193 │   │   │   │   │   # Files can be created in parallel, but we limit how many are opened   │
│   194 │   │   │   │   │   # so we don't use memory excessively.                                  │
│   195 │   │   │   │   │   async with sem:  # noqa                                                │
│ ❱ 196 │   │   │   │   │   │   await self.client.put(get_relative_path(self.source, p), p, sess   │
│   197 │   │   │   │                                                                              │
│   198 │   │   │   │   tasks.append(task(path))                                                   │
│   199 │   │   │   else:                                                                          │
│                                                                                                  │
│ /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dbx/sync/clients.py:273 in put   │
│                                                                                                  │
│   270 │   │   │   │   │   else:                                                                  │
│   271 │   │   │   │   │   │   txt = await resp.text()                                            │
│   272 │   │   │   │   │   │   dbx_echo(f"HTTP {resp.status}: {txt}")                             │
│ ❱ 273 │   │   │   │   │   │   raise ClientError(resp.status)                                     │
│   274                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ClientError: 404

Steps to Reproduce (for bugs)

Context

Your Environment

matthayes commented 1 year ago

Hey thanks for the report. I wonder if this is related to #280. The fact that it returns 404 on a put seems to suggest that the parent directory does not exist for some reason. Does this happen consistently when you rebase or is it sporadic?