beenotung / compress-json

Store JSON data in space efficient manner
https://www.npmjs.com/package/compress-json
BSD 2-Clause "Simplified" License
108 stars 6 forks source link

Compress / Decompress Number Divergence #3

Closed manu-de-verdun closed 3 years ago

manu-de-verdun commented 3 years ago

Hi,

Just find one issue during the compress / decompress process, in node

perscore values are different after completing a cycle

(the input is the result of a BigQuery query)

a workaround was found using Round in the SQL query to limit the number size

Input:

[
 {
  "sub": 1,
  "factors": [
   {
    "name": 0,
    "metric": 15,
    "percscore": 0.12371134020618557
   },
   {
    "name": 1,
    "metric": 4,
    "percscore": 0.032989690721649485
   },
   {
    "name": 2,
    "metric": 22,
    "percscore": 0.18144329896907216
   },
  ],
  "avg_score": 2.1651785714285716,
 }
]

outPut

[
 {
  "sub": 1,
  "factors": [
   {
    "name": 0,
    "metric": 15,
    "percscore": 0.03371134020618557
   },
   {
    "name": 1,
    "metric": 4,
    "percscore": 0.002989690721649485
   },
   {
    "name": 2,
    "metric": 22,
    "percscore": 0.4814432989690722
   },
  ],
  "avg_score": 2.1651785714285716,
 }
] 

thanks

beenotung commented 3 years ago

It seems due to precision limitation of number in javascript.

For example, when encoding 12.34, this library split it into 12 and 43 (decimal place is reversed to capture leading zeros). The 12 is then converted into integer and then encoded as C, and 43 is converted into integer and then encoded as h.

When encoding 0.12371134020618557, it is splitted into 0 and 75581602043117321. When the string 75581602043117321 is converted into integer, the values becomes 75581602043117330 (look at the last 5 digits, 17321 becomes 17330). This is due to limited precision in number representation in javascript.

One way to fix this issue is to encode the numerical string directly, without converting to integer. This may hurt performance. Another way is to convert the numerical string into BigInt. According to caniuse.com and MDN, it seems most environment already support BigInt. will explore fixing with BigInt and observe the impact on performance.

Thanks for the bug report!

beenotung commented 3 years ago

The reported bug is fixed https://github.com/beenotung/compress-json/commit/3f3bc2f41451a3573255bc30f54643a5af836545 and and patched version is release as compress-json@1.0.5

@manu-de-verdun you may close the issue if confirmed

beenotung commented 3 years ago

I'm closing this issue because I think it has been resolved. You're welcomed to re-open the issue if you're still encountering related bugs. Thank you.