FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Honor READ_ENUMS_USING_TO_STRING feature when deserializing #397

Open itsmoonrack opened 1 year ago

itsmoonrack commented 1 year ago

Currently the ProtobufParser uses a condition (isStdEnum) to know it we need to deserialize using the index form or string form.

   case ENUM:
        // 12-Feb-2015, tatu: Can expose as index (int) or name, but internally encoded as VInt.
        //    So for now, expose as is; may add a feature to choose later on.
        // But! May or may not be directly mapped; may need to translate
        {
            int ix = _decodeLength();
            if (_currentField.isStdEnum) {
                _numberInt = ix;
                _numTypesValid = NR_INT;
                type =  JsonToken.VALUE_NUMBER_INT;
            } else {
                // Could translate to better id, but for now let databind
                // handle that part
                String enumStr = _currentField.findEnumByIndex(ix);
                if (enumStr == null) {
                    _reportErrorF("Unknown id %d (for enum field %s)", ix, _currentField.name);
                }
                type = JsonToken.VALUE_STRING;
                _textBuffer.resetWithString(enumStr);
            }
        }
        break;

We could honor the READ_ENUMS_USING_TO_STRING feature so we return the string form when this feature is enabled.

cowtowncoder commented 1 year ago

I guess originally I was thinking that since databind level enum String is not available, this is not doable. Proto file does however have symbolic names so these could be surfaced. Question then becomes whether READ_ENUMS_USING_TO_STRING is the setting to use: its semantics are slightly different (namely, Enum.name() vs Enum.toString()), so I think that wouldn't work.

So I think that while allowing return-as-String makes sense, we'd probably should add Protobuf-specific feature instead. That'd be added as ProtobufParser.Feature (no features exist yet, so would need to be added). This assumes that we keep track of enum String values.

New feature should be disabled by default for backwards compatibility.

itsmoonrack commented 1 year ago

Ah yes you are right maybe the meaning are different.

I see two ways of resolving this, either we are semantically close to the fact that we want from a port adapter perspective the symbolic name (because we expect the mapping to be done in the proto file) and the READ_ENUM_USING_TO_STRING is quite close to what we want, or we could go with the ProtobufParser.Feature as you proposed.

I have a preference for the former (standard feature) even if they are not strictly equal to what we want, because they are close enough to not confuse people. The enum are generated so its not we differentiate the toString() from the name() in our case.

cowtowncoder commented 1 year ago

Actually there's another technical reason why I am pretty firmly for adding a new Proto-specific feature: DeserializationFeature is not accessible from ProtobufParser, being databind-level feature. So even if I accepted semantic difference, it's not actually doable. :)