RumbleDB / rumble

⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
213 stars 82 forks source link

A few useful functions to implement first #16

Closed ghislainfourny closed 6 years ago

ghislainfourny commented 6 years ago

concat string-join substring

(they are used in the tutorial)

wscsprint3r commented 6 years ago

Started working on this.

wscsprint3r commented 6 years ago

@ghislainfourny what does the substring function return when the indexes are out of bounds or negative?

ghislainfourny commented 6 years ago

Hi Stefan.

All non-JSONiq-specific functions are ultimately documented on this page, even though of course JSONiq only supports a subset: https://www.w3.org/TR/xpath-functions-30/#func-substring

The official JSONiq subset is listed here (for future reference): http://www.jsoniq.org/docs/JSONiq/html-single/index.html#idm41652848

And the JSONiq-specific functions are here (also for future reference): http://www.jsoniq.org/docs/JSONiq/html-single/index.html#idm34604304

Concretely regarding substring: if indices are out of bound, the function "elegantly returns", meaning that a negative start means starting on the first character, and if the length is too long, then it returns characters up to the end of the source string. This is NoSQL flexibility that users appreciate when they have billions of documents to process.

Also, indices start at 1, not at 0.

Thanks!

wscsprint3r commented 6 years ago

Merged

https://github.com/Sparksoniq/sparksoniq/pull/26/commits/5981753c27a918c8d75a02848907bcedd7522ff8