Closed zero323 closed 2 years ago
@zero323 my apologies for not having taken a look at this PR. I am unfamiliar with this aspect of python. can you tell me more about these annotations? Specifically
@zero323 my apologies for not having taken a look at this PR.
No worries.
I am unfamiliar with this aspect of python. can you tell me more about these annotations? Specifically
- How does it affect forward and backward compatibility?
In case of stub files there is no practical on forward or backward compatibility (inline annotation, not proposed here, are not backward compatible beyond Python 3.5). Annotations are only used by external tools, and have no impact on actual execution.
- Does these annotations always have to be in sync with the code? What happens if they are not in sync (e.g., we add a method in the python code but forget to add it to the annotation)?
Obviously this is the preferred to keep things consistent, as annotations which are not synced, provide misleading feedback. This might be a problem, if user use CI pipelines, which are aware of type annotations (type checkers and such).
Other than that, there is no impact at all (might be in the future, if users couple things with emerging tools like mypyc
).
@zero323 Thanks for the contribution. Are you still interested in this? If so, could you update this? This would be very helpful and would help us find issues such as #726
@zsxwing Sure.
Since I opened this PR, Spark started moving from stubs to inline hints. Should we consider the same here?
@tdas
- can you show examples of where inline hints are used in spark. i still see a lot of stubs .pyi in the spark file.
- are there any version constraints of using either stubs or hints within python 3?
If you're asking about versions that are not passed their End-of-life, then no. However, there are some considerations regarding usage of special features (Literals
, Protocols
etc.), which, depending on a version, might require typing-extensions
or might be unavailable.
@zsxwing @tdas Annotations and supporting code should be up-to-date now.
So, about switching to in-line annotations.... Any thoughts? If you're motivation is to support delta devs, as much as the users, in-line annotations are the way to go (in general type checkers don't validate the code, when corresponding stubs are present. This has pros and cons, but doesn't seem like what you're looking for now).
Since I opened this PR, Spark started moving from stubs to inline hints. Should we consider the same here?
@zero323 Thanks for pointing out this. I think that's better. Could you update your PR using inline hints?
Could you update your PR using inline hints?
I'll work on that, but it might take a while.
@zsxwing, could you be so kind and approve the workflow? TIA!
could you be so kind and approve the workflow? TIA!
Done. Thanks for the update!
Hi @zero323. We are reviewing this now. Please know, we won't get around to merging this until after the Delta OSS 1.1 release
Thanks for the update @scottsand-db
@xinrong-databricks and @ueshin what's the status on the re-review?
LGTM if tests passed. Thanks!
Hi @zero323 thanks for all the work on this PR. Almost there! this PR is failing python tests since commit 417dc53
with error (below). Can you please take a look and fix the failing tests?
python/delta/tests/test_deltatable.py:404: error: Invalid self argument "RDD[Row]" to attribute function "collectAsMap" with type "Callable[[RDD[Tuple[K, V]]], Dict[K, V]]" [misc]
python/delta/tests/test_deltatable.py:404: error: Incompatible types in assignment (expression has type "Dict[<nothing>, <nothing>]", variable has type "Dict[str, str]") [assignment]
Found 2 errors in 1 file (checked 12 source files)
Deleted /home/runner/.ivy2/cache/io.delta
### Executing cmd: /home/runner/work/delta/delta/build/sbt clean publishM2
### Executing cmd: /home/runner/work/delta/delta/dev/lint-python
##### Running mypy tests #####
### Executing cmd: mypy --config-file /home/runner/work/delta/delta/python/mypy.ini /home/runner/work/delta/delta/python/delta
Traceback (most recent call last):
File "python/run-tests.py", line 175, in <module>
run_mypy_tests(root_dir)
File "python/run-tests.py", line 113, in run_mypy_tests
], stream_output=True)
File "python/run-tests.py", line 83, in run_cmd
raise Exception("Non-zero exitcode: %s" % (exit_code))
Exception: Non-zero exitcode: 1
##### Running SBT tests #####
##### Running Python tests #####
Calling script %s python/run-tests.py
Traceback (most recent call last):
File "run-tests.py", line 76, in <module>
run_python_tests(root_dir)
File "run-tests.py", line 34, in run_python_tests
run_cmd(["python", python_test_script], stream_output=True)
File "run-tests.py", line 46, in run_cmd
raise Exception("Non-zero exitcode: %s" % (exit_code))
Exception: Non-zero exitcode: 1
Error: Process completed with exit code 1.
@scottsand-db I've seen this and I am investigating, but getting inconsistent behavior for ignores between local dev environment
➜ delta git:(annotations) git rev-parse HEAD
4ea8fff69a4631c91984abcc854a8fedcd26677c
➜ delta git:(annotations) mypy --version
mypy 0.910
➜ delta git:(annotations) rm -Rf .mypy_cache
➜ delta git:(annotations) sha256sum python/delta/tests/test_deltatable.py
d34acbe3ad488c65b1734fadf03e1732161f2c8a27e26b8ece72bb39d4030670 python/delta/tests/test_deltatable.py
➜ delta git:(annotations) mypy --no-incremental --config-file python/mypy.ini python/delta
Success: no issues found in 12 source files
and CI setup
dev@7fd042106847:~/delta$ git rev-parse HEAD
4ea8fff69a4631c91984abcc854a8fedcd26677c
dev@7fd042106847:~/delta$ pipenv run mypy --version
mypy 0.910
dev@7fd042106847:~/delta$ sha256sum python/delta/tests/test_deltatable.py
d34acbe3ad488c65b1734fadf03e1732161f2c8a27e26b8ece72bb39d4030670 python/delta/tests/test_deltatable.py
dev@7fd042106847:~/delta$ rm -Rf .mypy_cache/
dev@7fd042106847:~/delta$ pipenv run mypy --no-incremental --config-file python/mypy.ini python/delta
python/delta/tests/test_deltatable.py:404: error: Invalid self argument "RDD[Row]" to attribute function "collectAsMap" with type "Callable[[RDD[Tuple[K, V]]], Dict[K, V]]" [misc]
python/delta/tests/test_deltatable.py:404: error: Incompatible types in assignment (expression has type "Dict[<nothing>, <nothing>]", variable has type "Dict[str, str]") [assignment]
Found 2 errors in 1 file (checked 12 source files)
I'll ping you all, once I figure out what causes the issue.
Current error seems to be unrelated to PR (already present on https://github.com/delta-io/delta/commit/5571d95966e3b03c4699dcfa1a5cd86bb16447dc)
ImportError: cannot import name 'InstallCommand' from 'pipenv.vendor.pip_shims.shims' (/usr/local/lib/python3.8/dist-packages/pipenv/vendor/pip_shims/shims.py)
/usr/share/python-wheels/requests-2.22.0-py2.py3-none-any.whl/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
@zero323 I'm fixing it in https://github.com/delta-io/delta/pull/821
@zero323 you can merge master into your PR to fix the build.
Thanks @zsxwing!
I really cannot tell what causes difference in ignore handling between CI and local env, but both seem to be happy with the current position of type: ignore
(seems weird to me, TBH).
@zero323 do we need the empty file python/delta/py.typed
?
@zero323 do we need the empty file
python/delta/py.typed
?
Good catch @scottsand-db! Yes, we definitely do. That's what makes package recognized as supporting type hints (PEP 561). Which reminded me that it should be mentioned in setup (added 267b6be)
Hi @@zero323 , thanks for all of this work. I'm in the process of QAing and merging this ticket.
Can you please provide me with a tangible example of where + how this PR improves python typed annotations? and how you tested that (e.g. in some file X, using IntelliJ/vscode, Y is now visible, whereas Y was not visible before this change)?
I appreciate the help - not much of a python expert myself.
@scottsand-db
Can you please provide me with a tangible example of where + how this PR improves python typed annotations? and how you tested that?
I appreciate the help - not much of a python expert myself.
A good starting point to understand the impact is to see corresponding Spark JIRA ticket. To summarize, there three main aspects:
Precise documentation. Expected input and output types are clearly defined. Furthermore, we can describe possible relations between arguments. For delta
, the most obvious example are overloads of functions with counter-intuitive signatures. For example, take a look at whenMatchedUpdate
.
Because of this design decision
The original signature
is misleading, suggesting that is OK to omit set
. In contrast, annotated code clearly indicates that set
is mandatory:
Ability to statically detect many code issues.
This is already visible in the delta codebase, where mypy can detect incorrect calls within tests intended to fail ‒ hence all the ignores
, i.e.
We focus on mypy here, as the standard Python type checker, but depending on the environment (IDEs can come with their own type checkers providing different set of features) and type checker (there quite a few) your experience might be different.
Finally, type hints enable better syntax completion. Again, specific behavior depends on your environment (@ueshin and @xinrong-databricks might be able to answer how this interacts with Databricks notebooks), but here is a little demo how experience changes in IPython (which depends on awesome jedi library by davidhalter ‒ common choice for different projects) between 0f30f6f47761cb01043d84aad182f30cb59b5e31 and 267b6be43b206924165baed3e51660f1379c28db
Thank you!
Thanks all!
This PR adds type annotations for Python module, provided as stub files.
This should: