fluree / db

Fluree database library
https://fluree.github.io/db/
Other
340 stars 23 forks source link

Unicode of first char of IRI halts DB #924

Open bplatz opened 3 weeks ago

bplatz commented 3 weeks ago

Unicode characters should be valid in IRIs, and this works today except when used as an initial character after the namespace.

This works: {"@id": "ex:aஃ", ...}

This halts the db: {"@id": "ex:ஃb", ...}

Pending test that will fail is here (know that you'll have to kill the REPL and restart if you run test, it is completely unresponsive): https://github.com/fluree/db/blob/07128d2047882313b69fa14464472631c305c03b/test/fluree/db/transact/transact_test.clj#L551C1-L583C43

What I've found is that the transaction in the failed test above will make it all the way to fluree.db.flake.transact/final-db here: https://github.com/fluree/db/blob/07128d2047882313b69fa14464472631c305c03b/src/clj/fluree/db/flake/transact.cljc#L94C1-L117C20

However, something is terribly wrong with new-flakes in this fn. You can (count new-flakes), but any attempt to print/log/perform logic on new-flakes creates the stall.

Clearly the flake(s) are bad, and my guess is logging the flakes creates an error condition in our print methods for flakes: https://github.com/fluree/db/blob/07128d2047882313b69fa14464472631c305c03b/src/clj/fluree/db/flake.cljc#L131-L137 such that the JVM/REPL is completely blocked.

My thought on next step is to (a) try to update print method so we can at least see the flake issue to fix it, and/or start trying to diagnose the parts in flake creation process before the flake is created to see if the issue can be discovered, which is done here: https://github.com/fluree/db/blob/07128d2047882313b69fa14464472631c305c03b/src/clj/fluree/db/query/exec/update.cljc#L81-L93

This relates to issue: https://github.com/fluree/db/issues/918 and PR that added the test: https://github.com/fluree/db/pull/923

bplatz commented 3 weeks ago

Traced this now to SID Java implementation, the the unresponsiveness is from OutOfMemoryError - if you wait long enough the REPL will return with the stack trace once memory is exhausted.

Calling (fluree.db.json-ld.iri/->sid 101 "ஃb") reproduces this issue.

Stack trace:

ERROR fluree.db.api - #error {
 :cause "Java heap space"
 :via
 [{:type java.lang.OutOfMemoryError
   :message "Java heap space"
   :at [java.lang.Long valueOf "Long.java" 1199]}]
 :trace
 [[java.lang.Long valueOf "Long.java" 1199]
  [clojure.lang.Numbers num "Numbers.java" 1840]
  [fluree.db.util.bytes$long__GT_UTF8 invokeStatic "bytes.cljc" 48]
  [fluree.db.util.bytes$long__GT_UTF8 invoke "bytes.cljc" 37]
  [clojure.core$map$fn__5935 invoke "core.clj" 2772]
  [clojure.lang.LazySeq sval "LazySeq.java" 42]
  [clojure.lang.LazySeq seq "LazySeq.java" 51]
  [clojure.lang.RT seq "RT.java" 535]
  [clojure.core$seq__5467 invokeStatic "core.clj" 139]
  [clojure.core$apply invokeStatic "core.clj" 662]
  [clojure.core$mapcat invokeStatic "core.clj" 2800]
  [clojure.core$mapcat doInvoke "core.clj" 2800]
  [clojure.lang.RestFn invoke "RestFn.java" 423]
  [fluree.db.json_ld.iri$codes__GT_name invokeStatic "iri.cljc" 118]
  [fluree.db.json_ld.iri$codes__GT_name invoke "iri.cljc" 115]
  [fluree.db.json_ld.iri$get_name invokeStatic "iri.cljc" 142]
  [fluree.db.json_ld.iri$get_name invoke "iri.cljc" 140]
  [fluree.db.json_ld.iri$measure_sid invokeStatic "iri.cljc" 154]
  [fluree.db.json_ld.iri$measure_sid invoke "iri.cljc" 149]
  [fluree.db.flake$size_flake invokeStatic "flake.cljc" 573]
  [fluree.db.flake$size_flake invoke "flake.cljc" 559]
  [fluree.db.flake$size_bytes$fn__12342 invoke "flake.cljc" 603]
  [clojure.data.avl$avl_set_reduce invokeStatic "avl.clj" 1179]
  [clojure.data.avl$avl_set_reduce invoke "avl.clj" 1173]
  [clojure.data.avl.AVLSet reduce "avl.clj" 1764]
  [clojure.core$reduce invokeStatic "core.clj" 6886]
  [clojure.core$reduce invoke "core.clj" 6869]
  [fluree.db.flake$size_bytes invokeStatic "flake.cljc" 603]
  [fluree.db.flake$size_bytes invoke "flake.cljc" 600]
  [fluree.db.json_ld.commit_data$update_novelty invokeStatic "commit_data.cljc" 410]
  [fluree.db.json_ld.commit_data$update_novelty invoke "commit_data.cljc" 398]
  [fluree.db.flake.transact$final_db$fn__33603$state_machine__6719__auto____33630$fn__33633 invoke "transact.cljc" 107]]}