ArkThis / AHAlodeck

The place to be for discussing future any-data layouts.
GNU General Public License v3.0
1 stars 0 forks source link

How arbitrary can the data get? #7

Closed ross-spencer closed 2 days ago

ross-spencer commented 6 days ago

Following up on: https://github.com/steffenfritz/FileTrove/issues/110

Alternate Data Streams in NTFS look more "powerful" than I first considered, but do Linux xattrs for example perform the same way?

The example shows how to attach a text file to an ADS in NTFS,

My instinct is that to do this with xattrs, any arbitrary file would need to be encoded as base64 or similar text representation, but I'm not sure. It might be worth verifying and understanding the impact.

pjotrek-b commented 4 days ago

AFAIK (and have tested some on ext4/smbd/zfs on Xubuntu 20.04):

(*) Except with Redis: Works like a charm to just have "the file data" - as yet another metadata field, called "payload". Tried with MP3 files. Works perfectly without base64 (or Python does it internally?)

pjotrek-b commented 4 days ago

Related discussions on MinIO Github As part of "my quest to use Minio for AHAlodeck use".

I was sincerely, eh... disrupted - to find that MinIO only does 127 ASCII-only, with a seemlingly very limited charset. Like "no spaces or colons": I tried to simply do tags=this,that and, something other,etc - and I got "S3 says no". :smiling_face_with_tear:

So my plans are towards a completely data-type-irrelevant, filter-and-match-and-transform on-the-fly MacGyver-minded, LEGO engine :wink: . Meta /is/ Data - and I've written Redis proof-of-concepts which perform "as expected".

Which in fact is /very/ good.

pjotrek-b commented 4 days ago

I was surprised to find older, pre-S3 object storage implementations (Ciph, Hadoop?, OrangeFS, OpenStack Swift, etc) - provide proper xattrs (extended file attributes) conformity for at least 4k (ext4?) currently common limitation.

extended, different limits

btfs, apple-filesystem(s)?, and some other linux-filesystems have higher limits like 64k per value (legend has it?) - or Haiku's BeFS with "all data is equal and welcome and accessible by design" - until your system runs out of whatever storage :wink:

4k block limit for all keys+values.

So having ext4 xattr capabilities (4kB AFAIK) as prerequisite to go for as preferred AHAlodeck "tagging-environment" would give something comparable to "safe filenames" for metadata information.

I definitely suggest full Unicode (UTF-8 at least) (downwards-)compatible environment implementations. Should become as go-to-trivial as ASCII these days IMO. :yum:

Size/char limits: zawos?

de_at:zawos = en_pt:why bother?

Seeing how well redis/keydb perform already on local off-the-shelf hardware like even my notebook: We're in for a fun ride.

I do believe to go towards a generically scalable design (like Haiku, as I've understood it in a nutshell - and my tests). Simply enjoy thinking data as "something that I either have an ID of (and for) - or means to get one". And a "as-wide-or-local as desired" setup, easy as "apt install lamp"?

And then let the filesystem(s) deal with index/caching/storing meta+data on proper underlying tech.

World-Wide-Storage

This would allow full filesystem-capabilities-interoperability

the cloud rained down, and manifested in a collective network of interconnected memory "object graphs")

I will spawn a separate discussion/issue for this one. Titled "Imagine you can access any data anywhere as easy as opening a file or folder "tree" selection in your file manager?"

pjotrek-b commented 4 days ago

Having self-describing data objects, in related graphs, possibly referring to a proper accessible anywhere iiif.io viewer manifests?

The manifest json auto-generated by most default applications: just like that. You can simply expect to double-or-right-click any object on anything: And get a decent, expected behavior. Soothingly stable and simple. May requires more (or even less) RAM than before, but definitely worth figuring it out!

To me it feels like a kid who's entered his first lego store. Thank you inventors! (Is there's no emoji for "LEGO"? :scream:)

Data may get macgyver-ably arbitrary and interoperable at all times. Why not set "bit-proof" as base requirement? And provide fallback means out-of-the-box as common filesystem functions?

Curious to hear your opinions and thoughts on this! :smile: