RumbleDB / rumble

⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
213 stars 82 forks source link

Lazy load base64Binary and hexBinary #930

Open jsommr opened 3 years ago

jsommr commented 3 years ago

A string is required to create one of the binary types. Would it be possible to have a constructor accepting bytes that doesn't set the stringValue before serializing it? This would allow RumbleDB to efficiently deal with intermediate binaries where serializing them isn't necessary (imagine a query that downloads an image and returns what objects are in it as text)

ghislainfourny commented 3 years ago

Thank you for your comment.

Indeed, some other JSONiq engines do support this kind of lazy loading.

If more functions get added that provide alternative ways to construct hexBinary or base64Binary values (for example, a function that GETs a binary file from HTTP or any other system), there is indeed no reason to materialize the string and of course, Rumble will simply keep the bytes.

Do you have a specific use case in mind, in particular, would you like Rumble to be able to download binary files from the Web or open binary files from any file system?

jsommr commented 3 years ago

I was just curious because it seemed easy to implement, and would make sure before I began that I hadn't missed something. We've put Spark on hold for now, so this issue can be closed unless it makes sense to keep it open.