ferristseng / rust-ipfs-api

IPFS HTTP client in Rust
Apache License 2.0
247 stars 68 forks source link

insert CBOR dag nodes? #63

Open ec1oud opened 3 years ago

ec1oud commented 3 years ago

The IPLD data model is a superset of JSON: it should be OK to store binary data. So on the command line, ipfs dag put --input-enc cbor actually works. It's another option that this Rust implementation could perhaps support. Do you think it's doable? It looks like there's a pervasive assumption that the data is json.

The http API is asymmetric though: there's no way to read back cbor AFAICT ( ipfs/go-ipfs#4313 ), so until that is done, perhaps there's no point.

I tried to force a byte array into a string, and ran into the problem that Rust expects every string to be valid UTF-8. If I use serde_json::json!(unsafe { String::from_utf8_unchecked(buf) }), it fails at runtime. So it seems the json API is hopeless for dealing with binary data in DAG nodes.

The reason I want to do that is to directly store a contiguous array of numbers into a byte array, to avoid cbor overhead. An array of numbers in CBOR has a one-byte prefix in front of each number to declare the type. If you already know what the type is, that's a waste of space, and prevents passing the array unconverted to other software (for example to draw a line graph). So I'd rather that the dag node uses cbor to annotate the expected data type, and then the actual array of numbers should just be a binary array. It's fine to construct CBOR that way, but getting it into and out of dag nodes is problematic so far.

ec1oud commented 3 years ago

Of course it works to use block_get, then serde_cbor to parse the data; and it took me a while to figure out how to get rust to convert a byte vector to an f32 vector, but eventually I succeeded. But if I update it (append another number to the byte array, i.e. append the 4 bytes of a little-endian float) and then write cbor back again via block_put, the result is that ipfs dag get no longer works on the command line: it seems to assume that the data is protobuf instead of cbor.

$ ipfs dag get QmfBs7HAJTqCXS8RgXBiMSj3b5fCKSerM2fkSvTMgNYTrK Error: failed to decode Protocol Buffers: incorrectly formatted merkledag node: unmarshal failed. proto: PBNode: wiretype end group for non-group

whereas writing the cbor data to a file and then doing ipfs dag put --input-enc cbor updated.cbor is fine. So it will help to get this option added to dag_put, but at least I'm no longer completely stuck in the meantime, because I can write an ipfs client app that will work with the raw cbor data either way.

I also tried tagging: https://www.endpoint.com/blog/2019/03/18/extensible-binary-encoding-with-cbor says “IEEE 754 binary32, little endian, Typed Array” is tag 85. That would probably be the right thing to do; but go-ipfs itself can't deal with that: ipfs dag put --input-enc cbor says "missing an unmarshaller for tag 85".