Attumm / redis-dict

Python dictionary with Redis as backend, built for large datasets. Simplifies Redis operations for large-scale and distributed systems. Supports various data types, namespacing, pipelining, and expiration.
MIT License
51 stars 14 forks source link

Inherit from MutableMapping #53

Closed m3at closed 4 weeks ago

m3at commented 2 months ago

Hi, thanks for the useful lib.

Small change to inherit from MutableMapping to improve downstream type hints, for example to more easily accept either a dict or a RedisDict.

Attumm commented 1 month ago

@m3at

Improving type hinting would be a great addition. If we could have some unit tests added to this MR, we can explore your change further. Example

Maybe we can explore type hinting for editors; I normally write code with Vim. So type hinting and autocompletion are currently blind spots in this project.

Falls under Known unknowns currently.

m3at commented 1 month ago

Hi Attumm.

Adding new unit tests might not be the best way to validate typing. It's most common to use a type checker, mypy being the reference. Personally I like to use beartype, restricting it to check during tests only for codebases that are not quite ready for runtime type checks (there's a package for pytest, but you could do something similar with the stdlib's untittest that you're using in redis-dict). I won't have time to add this to redis-dict in the short term, but maybe in the future.


Maybe we can explore type hinting for editors; I normally write code with Vim. So type hinting and autocompletion are currently blind spots in this project.

I'm on nvim, with the built-in lsp and pyright + ruff for python (surely you can do the same in vim). This is how I spotted the lack of type in redis-dict 😉

Attumm commented 1 month ago

Today, I'll have time to go over the change and consider what the downstream effects might entail. It seems like a good addition to explicitly state our intent to adhere to the MutableMapping interface, which is more general than just the dict interface. This exercise will improve the maturity of the codebase. Using LSP and editors for Python is becoming really popular, there are even those that favour classes for autocomplete over dictionaries.

It's interesting to note how times have changed from 2012

I'll add some tests to this MR

Attumm commented 1 month ago

The last few weeks have been quite busy, so I haven't had the chance to elaborate on my thoughts about the comment regarding unit testing . While Python's dynamic nature is powerful, it can lead to unexpected behavior if not carefully managed. This often requires more conservative mindset for libraries and additional testing. However, in this case, it allows us to test compliance with abstract base class interfaces. By shifting our perspective on unit tests, we can extract more value from them. Adding tests for new functionality, assumptions, features, allows us to explore downstream effects and share code. Crucially, unit tests serve as documentation, ensuring ongoing compatibility and facilitating knowledge sharing throughout the project's lifetime. Thus, that was the intended meaning of 'unit tests': unit tests that evolve from mere verification tools to become cornerstone components of library code projects. By borrowing ideas from TDD, BDD. test/behavior driven development.

Greybeard story The small inheritance change reminded me of a story. Almost a decade ago, I worked on a mission-critical Python project that implemented novel way for parallelism and ran every five minutes. One particular day, we received more warnings than usual, and we quickly found the issue the hotfix was small and only changed a basic calculation, but that minor change caused Python's method resolution order to exhibit unexpected behavior during startup, preventing the deployment of the hotfix and we only got a very cryptic error message about being unable to call a method on a class. Oddly enough, this issue occurred only on the production machine. The solution involved further complicating the novel parallelism by adding the complexity of separating the scheduler from the workers, pinning the scheduler to a previous working git commit, and creating a new project with the latest hotfix for the workers. but the time pressure was enormous, as we would miss another batch of data every five minutes. Python dynamic nature got us in and out of that mess.

This experience, combined with Python's lack of Java-like compiler support for classes such as dead code elimination and its runtime solution, has made me cautious about Python inheritance.

Returning to the issue at hand, I created the tests I had in mind and tried to convey with my first comment, they initially seemed promising for the PR, with only minor differences. However Mypy told a different story. You can find the details in this commit.

Mypy results when using abstract classes clearly indicate that we'll need to reconsider our approach. While the PR aimed for quick improvements in LSP (Language Server Protocol), IDE, and code analyzer integration, there are still alternative methods to achieve the goal of the PR.. Granted not nearly as simple and straightforward. Today provided me with a rare opportunity to deep dive and I'll share my findings.

I learned that significant changes have occurred within the Python ecosystem regarding new conventions and best practices that have solidified into standards, particularly regarding type hinting. Since the start of this project. The codebase, originally initiated around 2017 for Python 2, later integrated support for Python 3 before ultimately phasing out Python 2. However, some remnants of the original Python 2 implementation remain, with the package directory structure being one of them. At that time, Python wasn't as widely adopted, and many conventions were influenced by prominent organizations in the Python community, such as Google. These conventions and best practices often overlapped, and their longevity was not guaranteed.

To summarize this PR isn't viable,Although it has been quite enlightening and the outcome has been positive. I initially assumed that docstrings and type hinting would suffice, but your PR prompted me to explore the inner workings of LSP, particularly in the context of Python. As a result, I've identified six potential improvements that could enhance LSP, IDE, and code analyzer integration, leading to a better experience for users of redis-dict. While the list may not be exhaustive, I'd love to discuss these with you and hear your thoughts.

  1. py.typed file: Signals type checker compliance, could improve type inference.PEP 561
  2. Python Protocols: Could provide better autocompletion, type checking, and error detection for duck-typed code without runtime overhead PEP 544
  3. .pyi stub file: Provides separate type information PEP 484 (Stub Files)
  4. Mypy configuration: configure Mypy at package/project level for consistent type checking and integration. Mypy configuration file
  5. __init__.py: Adhere to package layout standards with __init__.py containing __all__ to explicitly define the package's interface. Improve code analysis layout
  6. pyproject.toml: Adhere to the latest convention. pyproject.toml

However, some of these changes may introduce additional maintenance costs in the future, such as duplication. That said, given the maturity of the project, interoperability should take precedence.


Adding @Mark90, what are your thoughts on this?

Mark90 commented 1 month ago

It's been a while I used this library, but I'll give my 2c.

There already seem to be type annotations on RedisDict, so adding py.typed was the first thing I thought of as well. It just needs to be present as an empty file to make linters like mypy bother checking the annotations.

As for inheriting from MutableMapping; I can see why that would be helpful if one wants to pass a RedisDict instance to 3rd party code that explicitly expects a MutableMapping. But the mypy error from d66aca518493c3defc76bd9908ae73daa219a5d7 looks.. complicated :) I might be wrong but I don't think the protocol/pyi approaches would solve that. As a hacky workaround, users could do redis_dict_instance = cast(MutableMapping, redis_dict_instance) to solve such linting errors.

m3at commented 4 weeks ago

Thank you for the context and the thorough explanations. From my limited look at redis-dict, I also think that py.typed might be the easiest path. But the overall maintainability of the project is clearly more important! It's good to discuss but this is not a priority.

(let me close this merge request as it served its purpose)