delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

docs: clarify locking mechanism requirement for S3 #2558

Closed inigohidalgo closed 1 month ago

inigohidalgo commented 1 month ago

closes #2556

2069 also had the same confusion

github-actions[bot] commented 1 month ago

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

inigohidalgo commented 1 month ago

ACTION NEEDED

delta-rs follows the [Conventional Commits specification]

I have updated the pr title

ion-elgreco commented 1 month ago

Can you mention the exception for Cloudflare R2, see this PR: https://github.com/delta-io/delta-rs/pull/2083

ion-elgreco commented 1 month ago

https://github.com/delta-io/delta-rs/issues/1356#issuecomment-1736639674 here is an example for S3 Cloudflare

inigohidalgo commented 1 month ago

To make the docstrings explicit: what is the current status of other backends like MinIO? Can concurrent writes be enabled for them with DynamoDB? Or does that only apply to AWS?

ion-elgreco commented 1 month ago

To make the docstrings explicit: what is the current status of other backends like MinIO? Can concurrent writes be enabled for them with DynamoDB? Or does that only apply to AWS?

I'm not sure about Minio, @wjones127 mentions it supports custom headers so it might work as well for Minio like R2.

inigohidalgo commented 1 month ago

Ok. Should I add a small section at the end (or beginning) of the S3 section with the example from the linked issue and also indicate that the same might be possible in MinIO? Or should I just mention the known R2 compatibility?

ion-elgreco commented 1 month ago

You can mention that Minio might work in similar fashion, but needs to be double checked against their docs or something along those lines

inigohidalgo commented 1 month ago

Just spotted this section in the readme https://github.com/delta-io/delta-rs/tree/b05d7e90dc0717b39c7fca35ec3c99c251aee839?tab=readme-ov-file#cloud-integrations

Should I change anything there for MinIO?

inigohidalgo commented 1 month ago

I have added some cross-referencing links in the paragraphs I added so I wanted to check they work ok on my end before publishing the PR. I am having some trouble getting the docs to build but the rust + python docs builds are throwing me off a bit (I'm just used to Sphinx)

cd python
make develop  # crate and wheel build successfully
make build-docs  # installs all requirements correctly but then fails when building
INFO    -  [macros] - Macros arguments: {'module_name': 'docs/_build/macro', 'modules': [], 'render_by_default': True, 'include_dir': '', 'include_yaml': [], 'j2_block_start_string': '', 'j2_block_end_string': '',
           'j2_variable_start_string': '', 'j2_variable_end_string': '', 'on_undefined': 'keep', 'on_error_fail': False, 'verbose': False}
INFO    -  [macros] - Found local Python module 'docs/_build/macro' in: /Users/inigo/Programming/Repos/delta-rs
INFO    -  [macros] - Found external Python module 'docs/_build/macro' in: /Users/inigo/Programming/Repos/delta-rs
INFO    -  [macros] - Extra variables (config file): ['python_api_url', 'generator', 'social']
INFO    -  [macros] - Extra filters (module): ['pretty']
INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: /Users/inigo/Programming/Repos/delta-rs/site
INFO    -  mkdocstrings_handlers: Formatting signatures requires Black to be installed.
ERROR   -  Error reading page 'api/exceptions.md': Could not resolve alias deltalake._internal.DeltaError pointing at _internal.DeltaError (in python/deltalake/_internal.abi3.so:None)
Traceback (most recent call last):
  File "/Users/inigo/Programming/Repos/delta-rs/.venv/lib/python3.10/site-packages/griffe/dataclasses.py", line 1373, in _resolve_target
    resolved = self.modules_collection.get_member(self.target_path)
  File "/Users/inigo/Programming/Repos/delta-rs/.venv/lib/python3.10/site-packages/griffe/mixins.py", line 78, in get_member
    return self.members[parts[0]].get_member(parts[1:])  # type: ignore[attr-defined]
KeyError: '_internal'

If I open python I am able to access deltalake.exceptions.DeltaError without issue so I'm a bit lost on what the problem could be.

inigohidalgo commented 1 month ago

I have verified the documentation builds on my end, and the links I added work okay.

inigohidalgo commented 1 month ago

https://github.com/delta-io/delta-rs/actions/runs/9332194054/job/25687819730?pr=2558

Oops. I forgot I had this same failure locally https://github.com/delta-io/delta-rs/pull/2558#issuecomment-2143366348, and I commented out the exceptions in docs/api/exceptions.md. I don't see how this can be related to my PR though.

inigohidalgo commented 1 month ago

Okay I see that action has been failing for a while.