camgunz / cmp

An implementation of the MessagePack serialization format in C / msgpack.org[C]
http://msgpack.org
MIT License
336 stars 78 forks source link

The library is too perfect ;) #15

Closed Getty closed 9 years ago

Getty commented 9 years ago

Yes... thats the issue, ok, let me explain: The library is fully fulfilling the new spec, while most other implementations (Perl, JavaScript, to be specific) just support the old types. Which is something you dont notice AS LONG you never have a STR8 ... cause thats literally the only case where the stuff crash (And there are is no BIN support in the old spec, but that is not those implementations use it at all).

There is actually a very easy simple trick here to fix that, the library just needs to ignore STR8 for the packing and directly make STR16 instead. (2 comments, a define?) Out of the fact that I think many people will run into this problem if they mix the implementations (its REALLY not an obvious problem, took me 1-2 days to find out), i think it would be awesome if by default that would be part of the library (I can make the pull request, i mean its trivial), the question would only be how in my eyes (with define or different), but i think its crucial that the default behaviour just fits into "both ways".... the people still can enforce STR8 and they can still unpack STR8, there is no damage just the killing of problems of the bad state of the msgpack implementations.

What you think?

(P.S.: I am working on fixing the situation more general, but its hard to find responsible people ;), this is a first approach that would at least "fix" the situation)

antoinealb commented 9 years ago

I am personally against it. If other implementations are not compatible with the 2 year old "new" spec, that should be fixed in their code, not CMP's.

The Python version seems fine too:

from msgpack import unpackb
unpackb(bytes([0xd9, 2, 97, 97])) # 'aa'

Also there is already a public API to do what you are describing: directly use cmp_write_str16. Or comment those lines if you also want fixstr and str32: https://github.com/camgunz/cmp/blob/master/cmp.h#L280

antoinealb commented 9 years ago

Also speaking about Javascript, this seems to support str8: https://github.com/mcollina/msgpack5

Getty commented 9 years ago

Yes, it should be fixed in their code, the point is more of the overall experience for new users who don't actually understand the situation. The core problem here is really that the msgpack repositories seems to get no love from their maintainers, and I just try to find a good way of giving a good sync overall till all those other implementations are fixed. The big point here why i suggest this solution here is that there is no damage, it would be just a fix for the msgpack world.

About the JS: Thanks for pointing to this one, as I was using this one: https://github.com/msgpack/msgpack-javascript, still the problem is also in other libs, especially seen that this here: https://github.com/msgpack/msgpack/tree/perl-utf8-mode/msgpack this stuff was actually a general idea of something that all libs should integrate, and its using the old spec. The Perl lib for example is just using that spec, this central spec needs update, but I don't see any hope for now that someone actually cares for msgpack main repos at all (see issues and pull requests stacking everywhere).

I repeat: we talk here 100% about user experience, not about "thats how it technical should be". Those details make user avoid msgpack, and cmp specific is a crucial pillar in this existing world, as its the only microcontroller level C library that exist. I bet a lot of people ran into the problem without actually caring to solve it, and just using different serialization without any need.

antoinealb commented 9 years ago

I see. Well I guess if it is easily switchable for "advanced" users I am not against it.

Getty commented 9 years ago

Yeah, advanced users might even not care at all, STR8 or STR16 thats just one byte in "micro cosmos" of cases, who would care about that these days? ;) (Try to get SD cards under 4GB... my microcontroller product here has 256k flash, 64k RAM but I cant get less as 4 GB SD card hehe ;)

antoinealb commented 9 years ago

I know, but if it is in the spec, someone might need it, right ? :) And it's cool to see other people using CMP for microcontroller related work !

Getty commented 9 years ago

Well the old spec worked fine without it ;-) Its really very unnecessary and since i found out, i am really questioning why it was not added before (or other around, why someone thought adding it to the new spec and so break an else awesome compatibility level, was a good idea).

Using CMP for the microcontroller just makes sense. We have this "micro ARM" and we want to make a full rich web application. So i made it so that everything is msgpack, the web interface POST and PUTs msgpack's hash, so that i dont need to parse a string. Which also explains why it took a time till i noticed the problem, as I used only small strings for testing (smaller 32) which ends in FIXSTR which is also in old and new spec... really pervert situation ;). But without msgpack I would be lost. If you want I can give you access to the webinterface via simulator (Or you run the simulator yourself, its opensource https://github.com/LEDaquaristik/sunriser) so that you see what I do there ;) but just for someone who likes to see JS code ;-) hehe.

camgunz commented 9 years ago

Hey thanks for the compliment :) .

So the incompatibility here is that 0xD9 is reserved in v4, and str 8 in v5. The issue is that, while a CMP user can avoid using EXT and BIN data, you can't (and shouldn't be expected to) avoid using strings, and CMP's implementation automatically uses the smallest MP type possible, which sometimes is the unsupported str 8.

As @antoinealb already suggested, you could use the lower-level CMP functions to avoid using str 8.

I'm torn. On the one hand, I'm a believer in "be strict with what you send, be liberal in what you accept". I'm 100% sympathetic to the argument that many libraries are just ever so slightly incompatible and this would help tremendously.

On the other hand, every possible solution (except using the lower-level CMP functions) is crazy ugly.

I could add a switch in cmp_ctx_t which, when enabled, avoids str 8 stuff. That makes cmp_ctx_t fatter for 99% of users who don't need it though.

I could use preprocessor defines, but they're so ugly and require users to modify their build system, and the attendant documentation on CMP's side.

The approach I'm currently favoring is to have separate high-level functions that don't use str 8. I'd probably give them a suffix like cmp_write_object => cmp_write_object_v4 This approach has numerous benefits:

For what it's worth, my understanding is that the MP guys separated strings and binary data because of string keys in maps and automatic serialization in languages that have runtime type information. Without a standard way of specifying string encoding in raw data, ad-hoc schemes started causing interoperability issues.

Getty commented 9 years ago

Thats actually a good solution, combined with a big warning in the documentation, and best using it in the examples, referencing to the problem, so experts can use the "real" function if they want, but noobs would "copy example" and so come to the more compatible solution. That would be legit enough for me!

antoinealb commented 9 years ago

Linkers can remove unused functions automatically, no need to surround them with preprocessor defines. They make the code ugly and slightly harder to reason about.

My vote goes to a note in the documentation / example saying "use this function if your remote library doesn't support MessagePack5"

camgunz commented 9 years ago

Yeah I agree. OK, I'll rely on the end user to strip out stuff they don't need, whether they modify CMP themselves or use linker options or whatever other method.

Getty commented 9 years ago

It is totally enough if we add the new methods and use that new methods in the examples and warn about the situation. I could do that pully (after my stress), just want to be sure we are all on the same page.

camgunz commented 9 years ago

OK this is in 0ab00f6. I elected to keep the compatibility functions out of the examples, etc., because v4 should die and libraries that are unmaintained/incompatible should thus perish. I'm still a little wary of this kind of thing re: map keys, but there's only so much hand-holding I'm willing to do. Thanks for the idea, and sorry it took so long!