dask-contrib / dask-deltatable

A Delta Lake reader for Dask
BSD 3-Clause "New" or "Revised" License
46 stars 15 forks source link

Fix up some mypy errors #76

Closed jacobtomlinson closed 7 months ago

jacobtomlinson commented 8 months ago

There were some mypy issues in #71 that were unrelated. Following up here to fix things up. Closes #74

There seems to be two issues when running pre-commit run --all-files:

mrocklin commented 8 months ago

Thanks for handling this @jacobtomlinson . +1 to merge from my perspective.

However, I'm also curious about the failing tests. If you have even more appetite to fix things and want to fix that that would be welcome.

Leaving when to merge up to you.

jacobtomlinson commented 8 months ago

Yeah I'm wondering if upgrading to 0.16 has caused some other issues. Let me dig a little.

jacobtomlinson commented 8 months ago

Ok it looks like deltalake has changed how it reads a few dtypes from the all_primitive_types dataset.

binary                 string[pyarrow]  # detlalake
binary                          object  # dask

timestamp          datetime64[ns, UTC]  # detlalake
timestamp               datetime64[ns]  # dask

It looks like the failing test is already smoothing out some dtype issues and is assuming deltalake is correct so I've updated the test to do the same here.

However I still seem to be running into a problem with the test failing complaining the dtypes are not correct and I'm a little stumped.

mrocklin commented 7 months ago

cc @phofl who might have thoughts on dtypes returned here (I suspect that deltalake is using arrow)

jacobtomlinson commented 7 months ago

cc @jrbourbeau who also opened #74 about this issue.

phofl commented 7 months ago

The string thing seems fine, the utc change is a little bit odd

jacobtomlinson commented 7 months ago

@jrbourbeau @phofl do either of you have thoughts on how I can move this PR forward? I'm a little stuck as to why the assertion is failing.

phofl commented 7 months ago

Do you have the dtypes of both objects?

jacobtomlinson commented 7 months ago

Yeah but when I print them out that appear identical. So something deeper must be happening.

phofl commented 7 months ago

Can you post them here?

codecov-commenter commented 7 months ago

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 70.90%. Comparing base (ec1c90c) to head (696cfd3).

Files Patch % Lines
dask_deltatable/write.py 33.33% 2 Missing :warning:
dask_deltatable/core.py 0.00% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #76 +/- ## ========================================== - Coverage 71.62% 70.90% -0.73% ========================================== Files 6 6 Lines 356 354 -2 ========================================== - Hits 255 251 -4 - Misses 101 103 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

milesgranger commented 7 months ago

I came to give my review, noticed some conflicts and thought I'd make a PR to yours (https://github.com/jacobtomlinson/dask-deltatable/pull/1) in order to help resolve the conflicts caused by https://github.com/dask-contrib/dask-deltatable/pull/78, then sorta noticed that did most of what was done here. My apologies for stepping on your toes here.

jacobtomlinson commented 7 months ago

Thanks for the reviews and iterations on this everyone. I'm going to merge this in now.