clj-commons / byte-streams

A Rosetta stone for JVM byte representations
417 stars 33 forks source link

Late declarations of lower-cost conversions are ineffective #61

Open DerGuteMoritz opened 2 years ago

DerGuteMoritz commented 2 years ago

Declaring new conversions via def-conversion which would make conversion between two types less costly are ineffective if the conversion in question has occurred at least once before. This is caused by the global converter memoization which captures the state of the conversion graph at the point in time of the very first invocation for a given pair of source and dest types.

Reproducer:

(defrecord Foo [data])
(defrecord Bar [data])

(defn run [n]
  (prn :begin n)
  (bs/convert (Foo. "foo") String)
  (prn :end n)
  (newline))

(bs/def-conversion ^{:cost 0} [Foo Bar]
  [x _]
  (prn :foo->bar)
  (Bar. (:data x)))

(bs/def-conversion ^{:cost 0} [Bar String]
  [x _]
  (prn :bar->string)
  (:data x))

;; At this point we can only indirectly convert from `Foo` to `String` via `Bar`
(run 1)

;; Now we declare a direct conversion path from `Foo` to `String`
(bs/def-conversion ^{:cost 0} [Foo String]
  [x _]
  (prn :foo->string)
  (:data x))

;; But because the first invocation has already memoized the more costly path, it has no effect
(run 2)

Output:

:begin 1
:foo->bar
:bar->string
:end 1

:begin 2
:foo->bar
:bar->string
:end 2

In contrast, moving both run invocations after the last bs/def-conversion outputs:

:begin 1
:foo->string
:end 1

:begin 2
:foo->string
:end 2

This is a bit of a gotcha which might at least be worth documenting. Alternatively, def-conversion could reset the memo which should solve this issue.

DerGuteMoritz commented 2 years ago

Just found https://github.com/clj-commons/byte-streams/issues/10 which also touches on the issue of the hidden initialization cost which resulted in the introduction of a precache-conversions API. This didn't solve the early capturing issue but at least is some prior art in the spirit of my suggestion. However, it later got removed again without further explanation. Hm!

DerGuteMoritz commented 2 years ago

The main issue here could also be solved by invalidating the memo whenver new conversions are declared. However, this wouldn't also address the performance gotcha, so I decided to break it out into its own issue.

KingMob commented 2 years ago

It's important to articulate the problem; is this a real concern for anyone?

The only non-toy/demo/example I found of using def-conversion anywhere on Github was for clj-fdb, and that was for a new conversion, and in the tests.

This issue is currently theoretical afaict, and I don't think anyone should spend time on it. 😄