W1ldPo1nter / django-queryable-properties

Write Django model properties that can be used in database queries.
BSD 3-Clause "New" or "Revised" License
71 stars 1 forks source link

Do all uses of queryable_property.annotater() require a top-most output_field? #15

Closed ShaheedHaque closed 7 months ago

ShaheedHaque commented 7 months ago

I recently turned on usage of my 20-odd uses of queryable_property.annotater() using .select_properties() and was caught out by what appeared to be "unreliable" results in cases like this (note that the output is intended to be a float):

    @queryable_property
    def duration(self) -> float:
        return ...

    @duration.annotater
    @classmethod
    def duration(cls):
        return Case(
            When(finished__isnull=True, then=Value(0.0, output_field=FloatField())),
            default=ExpressionWrapper(
                ...
                output_field=FloatField()
            )
        )

where I'd "sometimes" get exceptions like this (note that the output is some random string and NOT a float):

Traceback (most recent call last):,
  ...
  File "/home/srhaque/repo/source/paiyroll/views/employee/mobile_views.py", line 714, in listing_data,
    for existing in qs:,
  File "/home/srhaque/venv/lib/python3.11/site-packages/django/db/models/query.py", line 398, in __iter__,
    self._fetch_all(),
  File "/home/srhaque/venv/lib/python3.11/site-packages/django/db/models/query.py", line 1881, in _fetch_all,
    self._result_cache = list(self._iterable_class(self)),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^,
  File "/home/srhaque/venv/lib/python3.11/site-packages/queryable_properties/managers.py", line 52, in __iter__,
    for obj in super(QueryablePropertiesIterableMixin, self).__iter__():,
  File "/home/srhaque/venv/lib/python3.11/site-packages/django/db/models/query.py", line 121, in __iter__,
    for row in compiler.results_iter(results):,
  File "/home/srhaque/venv/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1500, in apply_converters,
    value = converter(value, expression, connection),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^,
  File "/home/srhaque/venv/lib/python3.11/site-packages/django/db/models/expressions.py", line 363, in <lambda>,
    else float(value),
         ^^^^^^^^^^^^,
ValueError: could not convert string to float: 'Book holiday(2018-08-07..2018-08-30)'

This was pretty difficult to debug because the string Book holiday(2018-08-07..2018-08-30) is from some unrelated use of queryable_property.annotater(), but I eventually found that adding an explicit output_field to the above annotater like this caused the problem to go away:

    @duration.annotater
    @classmethod
    def duration(cls):
        return Case(
            When(finished__isnull=True, then=Value(0.0, output_field=FloatField())),
            default=ExpressionWrapper(
                ...
                output_field=FloatField()
            ), output_field=FloatField()  <<<<<<<<<<< newly added
        )

I'm not familiar enough with the layers involved, but this raises the following questions:

  1. Why the extra outermost output_field is needed?
  2. Why the ValueError exception is in the context of a completely different property?

Any insights would be most welcome. I guess this is not an issue for django-queryable-properties as such, but if the outer output_field is indeed needed, then maybe a hint in the docs to that effect might be warranted.

W1ldPo1nter commented 7 months ago

Regarding the title question: they shouldn't. Or to be more precise: they have the same requirements that regular .annotate() calls have since django-queryable-properties does pretty much just allow to define re-usable annotations that are added to querysets intelligently.

This is also why there's only one occurrence of django-queryable-properties code in your stack trace, which is also just a super() call that only invokes Django's code again. Whether or not an annotation requires an explicit output field is always entirely up to Django to decide. I suggest adding same annotation to a queryset via .annotate() without involving queryable properties at all to check if the result is any different.

For your specific case, I'm wondering whether you could just drop the inner output fields entirely and just keep the outer output field definition only. If that works, then you'd only have to specifiy the output field once and you may even be able to get rid of the ExpressionWrapper, although that might depend on the rest of the expression being wrapped. In my mind, it also makes more sense to just specify the output field on the outer level since the property/annotation should always use a single type anyway.

The exception message is pretty weird indeed. I guess I'll try to construct something similar and check if there's anything weird going on that is django-queryable-properties' fault or if there's anything that can be improved without having to fiddle with Django's internals more than necessary.

Your point regarding the docs is valid and I'll see whether I can make some improvements here so people can avoid stepping into "traps" like these. I previously didn't document anything about output fields for properties with annotater specifically since it's completely up to Django whether or not an annotation requires an explicit output field and it's therefore basically just "the rules for regular annotations apply". However, I do agree that a hint could be helpful here.

ShaheedHaque commented 7 months ago

Regarding the title question: they shouldn't. Or to be more precise: they have the same requirements that regular .annotate() calls have since django-queryable-properties does pretty much just allow to define re-usable annotations that are added to querysets intelligently.

This is also why there's only one occurrence of django-queryable-properties code in your stack trace, which is also just a super() call that only invokes Django's code again. Whether or not an annotation requires an explicit output field is always entirely up to Django to decide. I suggest adding same annotation to a queryset via .annotate() without involving queryable properties at all to check if the result is any different.

Indeed. And from some tests, the results were inconclusive because, despite the test suite being reproducible in normal use, I appeared to be getting "random" results as far as these exceptions are concerned. (I'm all too aware of how implausible that sounds).

For your specific case, I'm wondering whether you could just drop the inner output fields entirely and just keep the outer output field definition only. If that works, then you'd only have to specifiy the output field once and you may even be able to get rid of the ExpressionWrapper, although that might depend on the rest of the expression being wrapped. In my mind, it also makes more sense to just specify the output field on the outer level since the property/annotation should always use a single type anyway.

As a case in point, I did indeed remove the output_field in "Value(0.0, output_field=FloatField()" and then merge the change after running the tests. Several unrelated bugfixes and merge after testing followed, and then my tests started failing with a similar float() Value Error. I then restored this inner output_field, and the error went away.

This issue was filed after that. (Needless to say, I'm concerned about ALL my call sites being similarly unreliable...).

The exception message is pretty weird indeed. I guess I'll try to construct something similar and check if there's anything weird going on that is django-queryable-properties' fault or if there's anything that can be improved without having to fiddle with Django's internals more than necessary.

That's just what I was hoping for. Could some of the caching that goes on be implicated by the inconsistent behaviour?

Your point regarding the docs is valid and I'll see whether I can make some improvements here so people can avoid stepping into "traps" like these. I previously didn't document anything about output fields for properties with annotater specifically since it's completely up to Django whether or not an annotation requires an explicit output field and it's therefore basically just "the rules for regular annotations apply". However, I do agree that a hint could be helpful here.

ShaheedHaque commented 7 months ago

...and another one started failing. Curiously, both are to do with a Case(When()). The original mentioned above:

-            When(finished__isnull=True, then=Value(0.0)),
+            When(finished__isnull=True, then=Value(0.0, output_field=FloatField())),

and now this one:

-    Case(*[When(frequency=v, then=Value(_(l))) for v, l in FREQUENCY_CHOICES], output_field=CharField())
+    Case(*[When(frequency=v, then=Value(_(l), output_field=CharField())) for v, l in FREQUENCY_CHOICES], output_field=CharField())
W1ldPo1nter commented 7 months ago

I suggest adding same annotation to a queryset via .annotate() without involving queryable properties at all to check if the result is any different.

I didn't quite catch it from your response, but did you by chance try this, i.e. using the same annotation without involving queryable properties at all? It would be interesting if the annotation works outside of queryable properties without the outer output field or if that also leads to an error (maybe with a more useful message).

Curiously, both are to do with a Case(When()).

Could also be an issue that just affects these kinds of expressions, yes. I saw that my implementation of the MappingProperty (which is based on CASE/WHEN) also explicitly requires the output field, although I'm not setting inner output fields there. This specialized property does something similar as your second example (at least by the looks of it) and works in my test setup, even when being selected.

What Django version are you currently using?

ShaheedHaque commented 7 months ago

did you by chance try this, i.e. using the same annotation without involving queryable properties at all?

No, not tried that specifically.

What Django version are you currently using?

I have two slightly different setups...

  1. Machine A which seems not to show the problem: Django 4.2.7 and django-queryable-properties 1.8.5.
  2. Machine B which often shows the problem: Django 4.2.5 and django-queryable-properties 1.8.4

(There are other differences, but these are the obvious ones). Let me try gradually syncing them up.

ShaheedHaque commented 7 months ago

OK:

While that is welcome news, given the sporadic nature of the problem, I cannot say for sure if it is gone. I have also browsed the changes between 1.8.4 and 1.8.5 and - at least to my eye - saw nothing that might explain this.

W1ldPo1nter commented 7 months ago

Now that you are mentioning those two versions, I realize that the changes between them could actually explain all of this. The changelog entry

Selected queryable properties are no longer aliased with a unique name in queries and use their regular name instead

was actually a somewhat larger change under the hood in terms of how exactly selecting properties works internally. (Summary: Before the change, they received aliases when being selected to avoid that Django simply calls setattr with the property name, which would invoke the property's setter. The property value then had to be moved over from the alias attribute to the cache of the actual property. After the change, the aliasing is no longer happening, so the setattr call with the property happens, but django-queryable-properties now makes a distinction at this point to not call the actual setter implementation and just cache the value instead.)

Since this change also fixed other issues, I could imagine that it even fixed some more issues that were previously unknown to me and thus the update may actually fix your problem as well. I hope that the problem stays gone and can only recommend to use the latest version of django-queryable-properties to benefit from all bugfixes, even if they are somewhat hidden (not intentionally, of course).

ShaheedHaque commented 7 months ago

Cool. I propose to close this issue on that basis; we can always reopen it if needed.