RumbleDB / rumble

⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
213 stars 82 forks source link

Error returned instead of returning empty sequence #89

Closed ingomueller-net closed 5 years ago

ingomueller-net commented 6 years ago

The following query (where array access [[0]] is mistakenly written as array constructor [0]) causes a job abortion:

jiqs$ for $o in json-file("wasb:///sample.json")
>>> where $o.choices[0] eq $o.target
>>> return $o
>>>
>>>
[ERROR] Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 29, wn0-testsp.mjhvh3c1nveurdkkutux1zeupg.cx.internal.cloudapp.net, executor 2): sparksoniq.exceptions.IteratorFlowException: Error [err: XPDY0130]LINE:2:COLUMN:6:Invalid next() call;

I guess a syntax error (or at least some more high-level error) would be more appropriate.

ghislainfourny commented 6 years ago

Thanks, Ingo.

This is not a syntax error. [0] filters for a sequence position (in this case, the sequence only one has one array), but positions start at 1, so $o.choices[0] should return an empty sequence. Then, eq returns () as well and where should be false, so the overall query should return ().

But there is indeed a bug in our iterators that we need to fix.

FYI if you need the first choice, then the syntax for this should be

$o.choices[[1]]

(array lookup is [[ ]], and positions start at 1 in JSONiq).

ingomueller-net commented 6 years ago

OK, I think I understand the difference now. So, as you say, it's only the problem with the iterator.

ghislainfourny commented 5 years ago

I am happy to announce that this is now solved :)