agronholm / cbor2

Python CBOR (de)serializer with extensive tag support
MIT License
228 stars 59 forks source link

Moved global decoder dicts to decoder object #225

Open theeldermillenial opened 8 months ago

theeldermillenial commented 8 months ago

This PR moves a dict of decoders from a global context to a class context.

The purpose of this is to make subclassing cbor2.CBORDecoder for modification.

This PR assists in resolving #224 as an intermediate step.

coveralls commented 8 months ago

Coverage Status

coverage: 93.542%. remained the same when pulling 350c0b38fec593dd9b9c38189ff6f32ad6fac87f on theeldermillenial:feat/local-decoders into c4401179acc257f5442f42bd369fb4850df909bd on agronholm:master.

theeldermillenial commented 8 months ago

@agronholm sorry for tagging you but I couldn't assign you as a reviewer.

theeldermillenial commented 8 months ago

Hey @agronholm

Just wanted to bump this to make sure it doesn't fly under the radar.

theeldermillenial commented 8 months ago

@agronholm could you please take a look at this?

agronholm commented 7 months ago

This PR only seems to do this for the pure Python variant, not the C decoder?

theeldermillenial commented 7 months ago

@agronholm Yes

agronholm commented 7 months ago

Well, that's not good. I can't have the two different implementations deviate any further.

theeldermillenial commented 7 months ago

Does this implementation do anything that would result in a meaningful difference between Python and C?

As far as I can tell, it does not. I'm not changing anything about how the decoder actually works. I'm just changing the scope of the function calling dictionary. There are obvious differences between the Python and C versions strictly because they are written in different languages, right? I don't see how changing the scope of the function calling dict would be any different.

theeldermillenial commented 7 months ago

If you want to suggest things I can do to align the C version of the code with Python, I am happy to implement it.

agronholm commented 7 months ago

I think the proper place to start might be adding these new properties to CBORDecoder_getsetters in decoder.c. I don't have much bandwidth to help you with general C extension development, however. The C code was mostly contributed by another developer anyway.

theeldermillenial commented 7 months ago

I have developed c-extensions in the past, so that's not an issue. I can contribute for sure, but I don't think a contribution make sense here because this seems to be just a language difference.

I just don't see the functional difference for the scope change because the function dictionary is only ever internally called. The current way that the Python code was written, having that dictionary be external created issues if you wanted to subclass it (which you can't do with the C-extension anyway).

If you're really looking for feature parity between these two things, the way to approach it would be a substantial rewrite of either the Python code or the C-extension.

  1. On the Python side, you would want to eliminate all classes and make it a functional library.
  2. Alternatively, you could turn the C-extension into a c++ extension and then you could use pybind (which is what I would recommend if you really want true feature parity).

Regardless, the scoping of that dictionary is arbitrary given the current implementation of both the C and Python sections of the code.

Please correct me if I'm wrong here, but this seems like an arbitrary change. I'm definitely happy to modify the c-extension if that's what is required.

agronholm commented 7 months ago

The current way that the Python code was written, having that dictionary be external created issues if you wanted to subclass it (which you can't do with the C-extension anyway).

Wait, what?

>>> from _cbor2 import CBORDecoder
>>> class A(CBORDecoder):
...   pass
... 
>>> dir(A)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'decode', 'decode_array', 'decode_bigfloat', 'decode_bytestring', 'decode_datetime_string', 'decode_epoch_datetime', 'decode_float16', 'decode_float32', 'decode_float64', 'decode_fraction', 'decode_from_bytes', 'decode_ipaddress', 'decode_ipnetwork', 'decode_map', 'decode_mime', 'decode_negative_bignum', 'decode_negint', 'decode_positive_bignum', 'decode_rational', 'decode_regexp', 'decode_self_describe_cbor', 'decode_semantic', 'decode_set', 'decode_shareable', 'decode_sharedref', 'decode_simple_value', 'decode_special', 'decode_string', 'decode_stringref', 'decode_stringref_namespace', 'decode_uint', 'decode_uuid', 'fp', 'immutable', 'object_hook', 'read', 'set_shareable', 'str_errors', 'tag_hook']

If you subclass the C CBORDecoder class, and it has setters/getters for these decoder maps, then isn't that what you want?

theeldermillenial commented 7 months ago

My apologies. I'll take a look at this.

theeldermillenial commented 7 months ago

I'm having issues building the c-extension.

It looks like a few things were changed in the c-extension that are causing the build to fail. I was wondering if it was just me, but it looks like the git actions failed to build it as well.

I am capable of building 5.6.2, but 5.6.3 seemed to create two changes that are causing the build to fail:

  1. There is a missing declaration in a header file for _CBOR2_date_ordinal_offset. This is an easy fix. Error for reference:

    source/decoder.c: In function ‘CBORDecoder_decode_epoch_date’:
    source/decoder.c:1346:41: error: ‘_CBOR2_date_ordinal_offset’ undeclared (first use in this function)
    1346 |             ordinal = PyNumber_Add(num, _CBOR2_date_ordinal_offset);
      |       
  2. The way dates are parsed changed to use the Python builtin date object, and it's having issues that are not immediately apparent to me how to fix. Error for reference:

    source/decoder.c:1348:50: error: expected expression before ‘PyDateTime_Date’
    1348 |                 ret = PyObject_CallMethodObjArgs(PyDateTime_Date, _CBOR2_str_fromordinal, ordinal, NULL);
      |                                                  ^~~~~~~~~~~~~~~
    source/decoder.c:1348:23: error: too few arguments to function ‘PyObject_CallMethodObjArgs’
    1348 |                 ret = PyObject_CallMethodObjArgs(PyDateTime_Date, _CBOR2_str_fromordinal, ordinal, NULL);
      |     

I'm compiling on: Python - 3.10 gcc - 11.4 Ubuntu - 22.04

Any suggestions on how to resolve 2?

agronholm commented 7 months ago

It seems like there's a deeper problem that allows the failure to build the C extension to pass unnoticed. I'm making the appropriate changes to the tox and GitHub actions workflows to prevent this from happening in the future.

agronholm commented 7 months ago

As expected, this broke the CI. Well, this won't catch me by surprise ever again.

agronholm commented 7 months ago

Problem solved.