appy-one / acebase-client

Client to connect to remote AceBase NoSQL database server
MIT License
20 stars 8 forks source link

issue read/writing binary data #25

Closed donl closed 1 year ago

donl commented 1 year ago

Using regular AceBase seems fine, however in a client/server setup there appears to be an issue with binary data.

Here is an example that hopefully illustrates the issue:

import { AceBase } from "acebase";
import { AceBaseClient } from "acebase-client";
import { AceBaseServer } from "acebase-server";

import crypto from "crypto";

const options = {
  path: "./server",
  host: "localhost",
  port: 5758,
  authentication: {
    enabled: false,
  },
};

const server = new AceBaseServer("default", options);

server.db.ref("*").on("child_added", async (snap) => {
  const val = snap.val();
  if (val.binary) {
    const ab4 = new Uint8Array(val.binary);

    const hashBuffer2 = await crypto.subtle.digest("SHA-256", ab4);
    console.log("ab4", hex(hashBuffer2));
    // console.log('ab4',hex(ab4));
  } else {
    console.log("Event - child_added: ", val);
  }
});

//
//

const byteToHex = [];

for (let n = 0; n <= 0xff; ++n) {
  const hexOctet = n.toString(16).padStart(2, "0");
  byteToHex.push(hexOctet);
}

function hex(arrayBuffer) {
  const buff = new Uint8Array(arrayBuffer);
  const hexOctets = []; // new Array(buff.length) is even faster (preallocates necessary array size), then use hexOctets[i] instead of .push()

  for (let i = 0; i < buff.length; ++i) hexOctets.push(byteToHex[buff[i]]);

  return hexOctets.join("");
}

//
//

let settings = {
  host: "localhost",
  port: 5758,
  https: false,
  dbname: "default",
  logLevel: "verbose",
};

// const db = new AceBase();  // seems to work as expected...
const db = new AceBaseClient(settings);
await db.ready();

const url =
  "https://www.google.com/logos/doodles/2022/labor-day-2022-6753651837109490.3-l.png";

const response = await fetch(url);
if (!response.ok) {
  throw new Error(`HTTP error, status = ${response.status}`);
}

const blob = await response.blob();

const arrayBuffer = await blob.arrayBuffer();

const hashBuffer = await crypto.subtle.digest("SHA-256", arrayBuffer);
console.log("ab", hex(hashBuffer));
// console.log('ab',hex(arrayBuffer));

const binary = new Uint8Array(arrayBuffer);

const newFileRef = db.ref("files").push();
await newFileRef.set({
  name: new Date().toString(),
  type: blob.type,
  binary: binary,
});

const snap = await db.ref(newFileRef.path).get();
const val = snap.val();
const ab2 = new Uint8Array(val.binary);

const hashBuffer2 = await crypto.subtle.digest("SHA-256", ab2);
console.log("ab2", hex(hashBuffer2));
// console.log('ab2',hex(ab2));

const aceBlob = new Blob([ab2], { type: val.type });

const ab3 = await aceBlob.arrayBuffer();
const hashBuffer3 = await crypto.subtle.digest("SHA-256", ab3);
console.log("ab3", hex(hashBuffer3));
// console.log('ab3',hex(ab3));

db.close();
server.shutdown();
appy-one commented 1 year ago

Hi Don,

I'm trying to run your code snippet, but I get a TypeError: Cannot read properties of undefined (reading 'digest') on the line const hashBuffer = await crypto.subtle.digest("SHA-256", arrayBuffer);

Also, I had to install and import node-fetch to get the fetch on the image to work.

Can you edit the code snippet to isolate the exact issue?

donl commented 1 year ago

I'm running node v18.7.0 -- it warns that Fetch is experimental but runs and I believe the crypto module is rather new in node as well. I had to specify "type": "module" in package.json for the import syntax.

The gist of the problem is when trying to store binary data from a png for instance, the result ends up a handful of bytes larger with the bytes mismatching from the original after a certain (random?) offset it seems.

It has worked ok with data from a small textual svg file and even small png (<2000 bytes?) data.

I'll see if I can rework the example for you.

Thanks!

donl commented 1 year ago

... the hashing was just an "easy" way to visually compare the binary data results. The console.log lines with the hex() of the original binary data object would show the actual data.

appy-one commented 1 year ago

Thanks, I switched to Node 18.8.0 to test

appy-one commented 1 year ago

It looks like the ascii85 encoding used for transport is causing an issue. I'll have to investigate this further.

donl commented 1 year ago

Here is an updated snippet, includes a working example in the comments.

import { AceBase } from "acebase";
import { AceBaseClient } from "acebase-client";
import { AceBaseServer } from "acebase-server";

const options = {
  path: "./server",
  host: "localhost",
  port: 5758,
  authentication: {
    enabled: false,
  },
};

const areEqual = (first, second) =>
    first.length === second.length && first.every((value, index) => value === second[index]);

const server = new AceBaseServer("default", options);

server.db.ref("*").on("child_added", async (snap) => {
  const val = snap.val();
  if (val.binary) {
    const eventUint8Array = new Uint8Array(val.binary);

    console.assert(areEqual(eventUint8Array, blobUint8Array),
    'eventUint8Array !== blobUint8Array')

  } else {
    console.log("Event - child_added: ", val);
  }
});

let settings = {
  host: "localhost",
  port: 5758,
  https: false,
  dbname: "default",
  logLevel: "verbose",
};

//const db = new AceBase();  // seems to work as expected...
const db = new AceBaseClient(settings);
await db.ready();

// this file fails between server <-> client
const url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Circle_-_black_simple.svg/480px-Circle_-_black_simple.svg.png";

// this file works between server <-> client
// const url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Circle_-_black_simple.svg/240px-Circle_-_black_simple.svg.png";

const response = await fetch(url);
if (!response.ok) {
  throw new Error(`HTTP error, status = ${response.status}`);
}

const blob = await response.blob();
const blobArrayBuffer = await blob.arrayBuffer();
const blobUint8Array = new Uint8Array(blobArrayBuffer);

const newFileRef = db.ref("files").push();
await newFileRef.set({
  name: new Date().toString(),
  type: blob.type,
  binary: blobUint8Array,
});

const snap = await db.ref(newFileRef.path).get();
const snapVal = snap.val();
const snapUint8Array = new Uint8Array(snapVal.binary);

console.assert(areEqual(snapUint8Array, blobUint8Array),'snapUint8Array !== blobUint8Array')

const aceBlob = new Blob([snapUint8Array], { type: snapVal.type });

const aceBlobArrayBuffer = await aceBlob.arrayBuffer();
const aceBlobUint8Array = new Uint8Array(aceBlobArrayBuffer);

console.assert(areEqual(snapUint8Array, aceBlobUint8Array), 'snapUint8Array !== aceBlobUint8Array:')

db.close();
server.shutdown();
donl commented 1 year ago

I just keep muddling my way through... It seems the crux of the issue is a missing continue here during encode:

if(!n){
            result.push('z');
            continue;

Upon research, I stumbled upon the Z85 encoding (z85-codec). Interesting in that no escaping is necessary for json (eg"). A quick test with a random jpeg had about .5% of the total space used due to escape characters. There might be some computation tradeoffs for and against to consider-- let alone backwards compatibility.

While going down this rabbit hole it made me wonder about packing/squashing data into the high bits of the UTF16 encoding used by IndexedDB.

appy-one commented 1 year ago

I think you nailed it with that continue, I'll perform some more tests and release a new version

appy-one commented 1 year ago

I published acebase-client v1.17.1, which will fix this issue from the client side, server side is up next!

appy-one commented 1 year ago

I just published scebase-server v1.13.0 which fixes this issue on the server side.

Thanks for your help @donl, really appreciated! 💯