alavrik / piqi

Piqi – universal schema language: JSON, XML, Protocol Buffers data validation and conversion
http://piqi.org
Apache License 2.0
246 stars 36 forks source link

Dashes in json identifiers #34

Open stroop23 opened 11 years ago

stroop23 commented 11 years ago

The json parser in 0.6.4 does not allow for dashes in json identifiers.

Given the following piqi definition (in dash.piqi):

.record [
        .name test-record
        .field [
                .name some-field
                .type string
                .code 1
        ]
]

I would expect the following:

$ echo "{\"some-field\":\"some-value\"}" | piqi convert -f json -t piq --type dash/test-record
:dash/test-record [ .some-field "some-value" ]

What actually happens is:

$ echo "{\"some-field\":\"some-value\"}" | piqi convert -f json -t piq --type dash/test-record
:1:2: Expected string identifier but found '"'

And the following should fail, but interestingly succeeds in parsing AND converting:

echo "{\"some_field\":\"some-value\"}" | piqi convert -f json -t piq --type dash/test-record
:dash/test-record [ .some-field "some-value" ]

But the output DOES conform to the piqi definition..

motiejus commented 11 years ago

I reproduced this. Also, another observation.

$ echo '<value><some-field>some-value</some-field></value>' | \
    piqi convert -f xml -t piq --type dash/test-record
:dash/test-record [ .some-field "some-value" ]
$
alavrik commented 11 years ago

This is the intended behavior. Moreover it has always worked this way and it is also mentioned in the doc:

... JSON field names are derived from Piqi field names by replacing all - characters with _.

That said, I've been thinking to allow arbitrary characters in JSON field names, but only when they are explicitly specified via the json-name property. The way json-name is derived from Piqi field name is unlikely to change, because underscores in JSON field name is a reasonable default.

stroop23 commented 11 years ago

I admit that I have not seen that piece of documentation. Since JSON allows any string as key name, the reasonable default would be to leave field names unmangled? Where does this behaviour come from? Why leave as is in one encoding, but change in another? I cannot see the use in adding an attribute for renaming a field, but i do see the use(requirement even!) in staying with the definition.

motiejus commented 11 years ago

I agree with Dennis, the main issue here is consistency. Why <my-field>1</my-field> in XML, whereas {"my_field": 1} in JSON? It is ambiguous both for server and client developers.

I carefully read the docs on encodings, and this dash-underscore translation in JSON looks arbitrary. I assume there were non-obvious technical reasons to do that. Could you highlight them?

alavrik commented 11 years ago

Two points. First, unlike XML, JSON maps nicely to programming languages (e.g. JavaScript). However, most languages don't allow dashes in identifiers. For this reason, it makes a lot of sense to follow a conventional style for naming field identifiers in JSON which assumes using underscores. Second, XML does't map to programming languages as seamlessly as JSON. Usually, some level of transformation is required anyway. Because of that, I decided to go with dashes in XML identifiers -- they just look nicer. Haven't really heard any complaints from Piqi/XML users so far.

stroop23 commented 11 years ago

JSON maps only nicely to javascript, for all others it needs to go to a similar parsing and transformation process as XML, so there is no difference there. As for Javascript, it does NOT need this translation either, dashes parse just fine and are equally usable. If you want this behaviour available it should be optional (like --normalize-names) otherwise identifiers should remain as is.

alavrik commented 11 years ago

@spil-dennis suppose you are right and I understand that you like dashes in identifiers even more than I do. How would you convince existing Piqi users that it is worth breaking backward compatibility of existing protocols and existing distributed applications that rely on identifiers with underscores?

If you really like dashes in JSON field names you can contribute optional support for it. I'll be happy to merge it.

stroop23 commented 11 years ago

I don't care for dashes or underscores either way, just that there should not be arbitrary diverging from the specification. Piqi uses dashes, so it's a bit strange to use that everywhere, but in one thing? As to backwards compatibility, obviously such a change should be in the form of an optional, defaulting to the current implementation.

So can I conclude from this discussion:

I will make an updated patch allowing to switch this behaviour.

alavrik commented 11 years ago

Great! Thanks.