areski / django-admin-tools-stats

Django-admin-tools-stats is a Django admin module that allow you to create easily charts on your dashboard based on specific models and criterias
http://django-admin-tools-stats.readthedocs.org/
Other
290 stars 74 forks source link

Change way how to create dict object from literal #49

Closed tmszi closed 5 years ago

tmszi commented 5 years ago

I get this error 'dictionary update sequence element #0 has length 1; 2 is required'.

areski commented 5 years ago

Could you give more information about the issue? Which element are you passing?

timthelion commented 5 years ago

OK. This is actually our bug exacerbated by the unfortunate behavior of pip and Pipenv. You can close this PR but I wanted to share what happened because I think it is quite interesting :). It also took me 5 hours to diagnose :(

We ran into this bug after doing pipenv lock. Other than pipenv lock there actually weren't any changes to our code, so the problem had to be in a dependency somewhere. Unfortunately, pipenv lock updates all packages simultaneously, so it was sort of hard to tell which package update broke things. I guess I could have tried manually editing Pipfile.lock to try to break that process up into stages, but it didn't occure to me at the time and would have taken a lot of work. Incidentally, as you'll find out if you read to the bottom of this report, that wouldn't have lead me to the problem.

The full backtrace was:

File "/app/.heroku/python/lib/python3.6/site-packages/admin_tools/dashboard/templatetags/admin_tools_dashboard_tags.py" in admin_tools_render_dashboard
  47.     dashboard.init_with_context(context)

File "/app/apps/aklub/dashboard.py" in init_with_context
  208.             self.children.append(YearDashboardCharts(**kwargs))

File "/app/.heroku/python/lib/python3.6/site-packages/admin_tools_stats/modules.py" in __init__
  237.         kwargs.setdefault('children', self.get_registration_charts(**kwargs))

File "/app/apps/aklub/dashboard_charts.py" in get_registration_charts
  47.             YearDashboardChart(_('By Day'), interval='days', **kwargs),

File "/app/.heroku/python/lib/python3.6/site-packages/admin_tools_stats/modules.py" in __init__
  84.         self.prepare_template_data(self.data, self.graph_key, self.select_box_value, self.other_select_box_values)

File "/app/.heroku/python/lib/python3.6/site-packages/admin_tools_stats/modules.py" in prepare_template_data
  175.         self.form_field = get_dynamic_criteria(graph_key, select_box_value, other_select_box_values)

File "/app/.heroku/python/lib/python3.6/site-packages/admin_tools_stats/modules.py" in get_dynamic_criteria
  196.                 for key in dict(dy_map):

Exception Type: ValueError at /
Exception Value: dictionary update sequence element #0 has length 1; 2 is required

Looking at the code that raises the exception I see:

def get_dynamic_criteria(graph_key, select_box_value, other_select_box_values):
    """To get dynamic criteria & return into select box to display on dashboard"""
    try:
        temp = ''
        conf_data = DashboardStats.objects.get(graph_key=graph_key).criteria.all()
        for i in conf_data:
            dy_map = i.criteria_dynamic_mapping
            if dy_map:
                temp = '<select name="select_box_' + graph_key + '" onChange="$(this).closest(\'form\').submit();">'
                for key in dict(dy_map):
                    ....

source

I look at where dy_map comes from and I see that it's coming from the db, via conf_data. Looking in our database using manage.py shell_plus I see that in place of a dict, there is raw json.

>>> dsc=DashboardStatsCriteria.objects.all()[0]
>>> dsc.criteria_dynamic_mapping
'{"-": "V\\u0161ichni", "onetime": "Jednor\\u00e1zov\\u00ed", "promise": "P\\u0159\\u00edslib", "regular": "Pravideln\\u00ed"}'

This PR fixes that, but I wanted to know what had changed. This code has been working for us for years as far as I know. I decided to delve deeper, so I took a look at what DashboardStatsCriteria.criteria_dynamic_mapping. It's a JSONField.

    criteria_dynamic_mapping = jsonfield.fields.JSONField(
        null=True, blank=True,
        verbose_name=_("dynamic criteria / value"),
        help_text=_(
            "a JSON dictionary of key-value pairs that will be used for the criteria"
            " Ex. \"{'false': 'Inactive', 'true': 'Active'}\"",
        ),
)

source

Looking at git blame, neither that, nor the configuration of the field has changed in 6 years!

So I decided to look into what had changed about JSONField itself. Just looking at the pypi project page I see that it is in "maintenance mode", but actually the code reveals that there have been changes in the past month. Specifically there is this commit which edits the from_db_value to return the value unmodified from the DB if using postgresql and not having any custom decoder_kwargs.cls configured.

Here are the changes to that method:

        def from_db_value(self, value, expression, connection):
            if value is None:
                return None
-            return   json.loads(value, **self.decoder_kwargs)
+            elif connection.vendor == 'postgresql' and self.decoder_kwargs.get('cls') is None:
+                return value
+            return json.loads(value, **self.decoder_kwargs)

At this point, I'm %70 sure that there is something wrong here. Before this commit, the code worked and django-jsonfield handed deserialized dictionaries as expected. Afterwards, it hands us a plain text string of serialized JSON.

I'm also pretty curious about this code which is new in the same commit.

+    def select_format(self, compiler, sql, params):
+        if compiler.connection.vendor == 'postgresql' and self.decoder_kwargs.get('cls') is not None:
+            # Avoid psycopg2's automatic decoding to allow custom decoder
+            return '%s::text' % sql, params
+        return super().select_format(compiler, sql, params)

I'm not sure how to examine the compiler object without getting out the big guns and editing the code to add a pudb line in there, but after playing around with the object I do have access to I became fairly convinced that it was a no-op; self.decoder_kwargs.get('cls') is None and the super method is a pass through wich can be seen both at the REPL and in the code.

>>> super(JSONField, cdm).select_format(None, 1,2)
(1, 2)
    def select_format(self, compiler, sql, params):
        """
        Custom format for select clauses. For example, GIS columns need to be
        selected as AsText(table.col) on MySQL as the table.col data can't be
        used by Django.
        """
        return sql, params

source

But there is something wrong with the hypothesis that the change to the from_db_value method is causing the problem. You see, only the deserialization code was changed in that commit, but not the serialization code. If this code change really broke things, then they should be broken, but when I tested the field, it actually worked. If you put in a dict then you get a dict out. I also notice something interesting. If you set the field to a str then you get an str out.

>>> dsc=DashboardStatsCriteria.objects.create()
>>> dsc.criteria_dynamic_mapping = "foo"
>>> dsc.criteria_dynamic_mapping
'foo'
>>> dsc.criteria_dynamic_mapping = {1:2}
>>> dsc.criteria_dynamic_mapping
{1: 2}
>>> dsc.criteria_dynamic_mapping = '{1:2}'
'{1: 2}'

So my assumptions change. It seems that the problem isn't that deserialization isn't working properly, the actual problem is that the value was previously serialized as a string and now the db is expecting some kind of native json. This is where things get really weird.

The django-jsonfield code for deciding what native db type to use can be seen here and it hasn't changed in 4 years:

    def db_type(self, connection):
        if connection.vendor == 'postgresql':
            # Only do jsonb if in pg 9.4+
            if connection.pg_version >= 90400:
                return 'jsonb'
            return 'text'
        if connection.vendor == 'mysql':
            return 'longtext'
        if connection.vendor == 'oracle':
            return 'long'
        return 'text'

source

It returns 'jsonb', and I see no reason why that should have ever changed. Postgress 9.4 is from 2014 nothing we run is that old.

>>> from django.db import connections
>>> con = connections['default']
>>> DashboardStatsCriteria.objects.all()[0]._meta.fields[4].db_type(con)
'jsonb'
>>> con.pg_version
100010
>>> con.vendor
'postgresql'

But actually, when I loaded up a REPL on a pre-upgrade server something shocking happens.

>>> from django.db import connections
>>> con = connections['default']
>>> DashboardStatsCriteria.objects.all()[0]._meta.fields[4].db_type(con)
'text'

But it's worse than that, the environment hasn't changed:

>>> con.pg_version
100010
>>> con.vendor
'postgresql'

A method that hasn't changed in 4 years, running in the same exact environment gives me a different result.

But it's worse than that

Locally, I have django-jsonfield 1.3.1 installed. The same version as is running on the broken server, yet, db_type returns 'text'!

After an hour of poking at it, a trip to the psychiatrist and a liter of non-alchoholic bubbly liquid, I googled around and found out that there is this module in python called inspect. inspect allows you to see the source of a method like so:

>>> import inspect
>>> print(inspect.getsource(DashboardStatsCriteria.objects.all()[0]._meta.fields[4].db_type))
>>> # Server where db_type returns text
>>> print(inspect.getsource(DashboardStatsCriteria.objects.all()[0]._meta.fields[4].db_type))
    def db_type(self, connection):
        """
        Return the database column data type for this field, for the provided
        connection.
        """
        # The default implementation of this method looks at the
        # backend-specific data_types dictionary, looking up the field by its
        # "internal type".
        #
        # A Field class can implement the get_internal_type() method to specify
        # which *preexisting* Django Field class it's most similar to -- i.e.,
        # a custom field might be represented by a TEXT column type, which is
        # the same as the TextField Django field type, which means the custom
        # field's get_internal_type() returns 'TextField'.
        #
        # But the limitation of the get_internal_type() / data_types approach
        # is that it cannot handle database column types that aren't already
        # mapped to one of the built-in Django field types. In this case, you
        # can implement db_type() instead of get_internal_type() to specify
        # exactly which wacky database column type you want to use.
        data = self.db_type_parameters(connection)
        try:
            return connection.data_types[self.get_internal_type()] % data
        except KeyError:
            return None
>>> # server where db_type returns jsonb
    def db_type(self, connection):
        if connection.vendor == 'postgresql':
            # Only do jsonb if in pg 9.4+
            if connection.pg_version >= 90400:
                return 'jsonb'
            return 'text'
        if connection.vendor == 'mysql':
            return 'longtext'
        if connection.vendor == 'oracle':
            return 'long'
        return 'text'

WHAT??? On the old server, the source of db_type, the one that is inexplicably returning 'text', is totally different than the source I found on github and the source on the new 'broken' server. How did that happen?

OK, perhaps someone uploaded a different source to pypi. I try to examine versions, I'm looking for an easy way to download a package from pypi and unpack it, or better, match a git commit hash to a pypi package in a verifiable way. Have we been owned? Why is the old source code different? I'm trying to look at git blame, to see if I didn't somehow end up with a more than 4 year old version, but that source code seems to be no-where. Finally, locally, I start looking in my pipenv folder and find that weird source in the jsonfield directory. The full path to the file in question is as follows for me:

$cat /home/timothy/pu/projects/auto-mat/klub/pipenv/.local/share/virtualenvs/klub-v-nnivlwsY/lib/python3.6/site-packages/jsonfield/fields.py
import copy
from django.db import models
from django.utils.translation import ugettext_lazy as _
try:
    from django.utils import six
except ImportError:
    import six

try:
    import json
except ImportError:
    from django.utils import simplejson as json

from django.forms import fields
try:
    from django.forms.utils import ValidationError
except ImportError:
    from django.forms.util import ValidationError

from .subclassing import SubfieldBase
from .encoder import JSONEncoder

class JSONFormFieldBase(object):
    def __init__(self, *args, **kwargs):
        self.load_kwargs = kwargs.pop('load_kwargs', {})
        super(JSONFormFieldBase, self).__init__(*args, **kwargs)

    def to_python(self, value):
        if isinstance(value, six.string_types) and value:
            try:
                return json.loads(value, **self.load_kwargs)
            except ValueError:
                raise ValidationError(_("Enter valid JSON"))
        return value

    def clean(self, value):

        if not value and not self.required:
            return None

        # Trap cleaning errors & bubble them up as JSON errors
        try:
            return super(JSONFormFieldBase, self).clean(value)
        except TypeError:
            raise ValidationError(_("Enter valid JSON"))

class JSONFormField(JSONFormFieldBase, fields.CharField):
    pass

class JSONCharFormField(JSONFormFieldBase, fields.CharField):
    pass

class JSONFieldBase(six.with_metaclass(SubfieldBase, models.Field)):

    def __init__(self, *args, **kwargs):
        self.dump_kwargs = kwargs.pop('dump_kwargs', {
            'cls': JSONEncoder,
            'separators': (',', ':')
        })
        self.load_kwargs = kwargs.pop('load_kwargs', {})

        super(JSONFieldBase, self).__init__(*args, **kwargs)

    def pre_init(self, value, obj):
        """Convert a string value to JSON only if it needs to be deserialized.

        SubfieldBase metaclass has been modified to call this method instead of
        to_python so that we can check the obj state and determine if it needs to be
        deserialized"""

        try:
            if obj._state.adding:
                # Make sure the primary key actually exists on the object before
                # checking if it's empty. This is a special case for South datamigrations
                # see: https://github.com/bradjasper/django-jsonfield/issues/52
                if getattr(obj, "pk", None) is not None:
                    if isinstance(value, six.string_types):
                        try:
                            return json.loads(value, **self.load_kwargs)
                        except ValueError:
                            raise ValidationError(_("Enter valid JSON"))

        except AttributeError:
            # south fake meta class doesn't create proper attributes
            # see this:
            # https://github.com/bradjasper/django-jsonfield/issues/52
            pass

        return value

    def to_python(self, value):
        """The SubfieldBase metaclass calls pre_init instead of to_python, however to_python
        is still necessary for Django's deserializer"""
        return value

    def get_prep_value(self, value):
        """Convert JSON object to a string"""
        if self.null and value is None:
            return None
        return json.dumps(value, **self.dump_kwargs)

    def _get_val_from_obj(self, obj):
        # This function created to replace Django deprecated version
        # https://code.djangoproject.com/ticket/24716
        if obj is not None:
            return getattr(obj, self.attname)
        else:
            return self.get_default()

    def value_to_string(self, obj):
        value = self._get_val_from_obj(obj)
        return self.get_db_prep_value(value, None)

    def value_from_object(self, obj):
        value = super(JSONFieldBase, self).value_from_object(obj)
        if self.null and value is None:
            return None
        return self.dumps_for_display(value)

    def dumps_for_display(self, value):
        return json.dumps(value, **self.dump_kwargs)

    def formfield(self, **kwargs):

        if "form_class" not in kwargs:
            kwargs["form_class"] = self.form_class

        field = super(JSONFieldBase, self).formfield(**kwargs)

        if isinstance(field, JSONFormFieldBase):
            field.load_kwargs = self.load_kwargs

        if not field.help_text:
            field.help_text = "Enter valid JSON"

        return field

    def get_default(self):
        """
        Returns the default value for this field.

        The default implementation on models.Field calls force_unicode
        on the default, which means you can't set arbitrary Python
        objects as the default. To fix this, we just return the value
        without calling force_unicode on it. Note that if you set a
        callable as a default, the field will still call it. It will
        *not* try to pickle and encode it.

        """
        if self.has_default():
            if callable(self.default):
                return self.default()
            return copy.deepcopy(self.default)
        # If the field doesn't have a default, then we punt to models.Field.
        return super(JSONFieldBase, self).get_default()

class JSONField(JSONFieldBase, models.TextField):
    """JSONField is a generic textfield that serializes/deserializes JSON objects"""
    form_class = JSONFormField

    def dumps_for_display(self, value):
        kwargs = {"indent": 2}
        kwargs.update(self.dump_kwargs)
        return json.dumps(value, **kwargs)

class JSONCharField(JSONFieldBase, models.CharField):
    """JSONCharField is a generic textfield that serializes/deserializes JSON objects,
    stored in the database like a CharField, which enables it to be used
    e.g. in unique keys"""
    form_class = JSONCharFormField

try:
    from south.modelsinspector import add_introspection_rules
    add_introspection_rules([], ["^jsonfield\.fields\.(JSONField|JSONCharField)"])
except ImportError:
    pass

The file is totally different from the one found on github.

I keep looking at our Pipfile.lock file and the pypi entry for django-jsonfield and it is clear that the sources for that package should corelate to that github repo. Finally, I just start googling snippets of source code from the "wrong" fields.py file, and I find that there is, surprise surprise, another github repo with the same name. Now I'm really confused though, because I was quuite sure that pypi doesn't allow one package to overwrite another, and I don't see any change of ownership of the name django-jsonfield in pypi. Helpfully, however, the "other" django-jsonfield has the following paragraph in it's README.rst file.

"""" Note: There are a couple of third-party add-on JSONFields for Django. This project is django-jsonfield here on GitHub but is named jsonfield on PyPI. There is another django-jsonfield on Bitbucket, but that one is django-jsonfield on PyPI_. I realize this naming conflict is confusing and I am open to merging the two projects.

.. _jsonfield on PyPI: https://pypi.python.org/pypi/jsonfield .. _django-jsonfield on Bitbucket: https://bitbucket.org/schinckel/django-jsonfield .. _django-jsonfield on PyPI: https://pypi.python.org/pypi/django-jsonfield """"

Glancing at our Pipfile.lock again, I see that indeed, we have entries for both jsonfield and django-jsonfield. Neither of these packages are explicitly specified in our Pipfile, both are pulled in as dependencies by 3rd party libs. We've had both forever though, so why did things suddenly break? The answer is that with Pipenv the order of dependency installation is non-deterministic, and this specific pipenv lock just happened to change that order without any explanation. There was never any warning that there were two libraries that conflicted with eachother. If we still used requirements.freeze files like we used to, then such a random change in the order of installation would not have happened, but even in that case this bug would have been hard to trace.

Now all I need to do, I guess, is figure out how to mitigate this problem. I guess I should ask the Pipenv folks or something.

areski commented 5 years ago

Thanks for sharing your experience!