fangq / jsonlab

JSONLab: compact, portable, robust JSON/binary-JSON encoder/decoder for MATLAB/Octave
http://iso2mesh.sf.net/jsonlab
BSD 3-Clause "New" or "Revised" License
301 stars 118 forks source link

Float issue with py-ubjson file #25

Closed rconan closed 7 years ago

rconan commented 8 years ago

Hi, I am trying to import in Matlab a ubj file generated with py-ubjson. Everything works fine, except for float? Any idea why? in python: import ubjson d = {'string': 'test', 'int': 123456,'pi':3.1415} ubjson.dump(d,open('test.ubj','w')) in Matlab: s = loadubjson('test.ubj'') s = int: 123456 pi: 1.0965e+227 string: [116 101 115 116]

fangq commented 8 years ago

I looked into this, it looks like py-ubjson saved the floating point (double in this case) in the Big-endian format, even the host machine is a little-endian machine. To show this, you can run this in matlab:

fid=fopen('ieee-le.bin','wb');
fwrite(fid,3.1415,'double');
fclose(fid);
fid=fopen('ieee-be.bin','wb');
fwrite(fid,3.1415,'double','ieee-be');
fclose(fid);

and then dump the hex data from the output file:

fangq@wazu:~$ hexdump -C ieee-le.bin
00000000  6f 12 83 c0 ca 21 09 40                           |o....!.@|
00000008
fangq@wazu:~$ hexdump -C ieee-be.bin
00000000  40 09 21 ca c0 83 12 6f                           |@.!....o|
00000008

in comparison, the hex dump for the py-ubjson output is shown here:

fangq@wazu:~$ hexdump -C ~/space/Library/py-ubjson/test.ubj 
00000000  7b 55 03 69 6e 74 6c 00  01 e2 40 55 02 70 69 44  |{U.intl...@U.piD|
00000010  40 09 21 ca c0 83 12 6f  55 06 73 74 72 69 6e 67  |@.!....oU.string|
00000020  5b 24 55 23 55 04 74 65  73 74 7d                 |[$U#U.test}|
0000002b

the big-endian double is evident in the first half of the second row.

The UBJSON specs Draft-12 is not exactly clear on the byte orders for floats. It does say explicitly that all integer values are saved in the Big-endian format, but for floats (d and D), the only thing mentioned was IEEE 754 compliant.

http://ubjson.org/type-reference/value-types/#numeric-byte-order-endianness

However, AFAIK, IEEE 754 does not specify Endianness.

UBJSON spec does say "all numeric values be written in Big-Endian order" in this document:

http://ubjson.org/#endian

but it is very brief and non-specific. The only specific specs are the big-endian integers.

I am going to get clarifications from ubjson's maintainer. If d/D are indeed meant to saved in the big-endian format, then I will remove the "id<=5" from line https://github.com/fangq/jsonlab/blob/master/loadubjson.m#L317.