Clojure-Intro-Course / babel

MIT License
2 stars 0 forks source link

Restructure `into` spec to better fit possible usages #64

Open stanislowskij opened 1 month ago

stanislowskij commented 1 month ago

into is simultaneously under-spec'd and over-spec'd right now due to some weird edge cases that come with the function. Officially, the documentation notes that, with two arguments, into should take two collections, but there are several cases where this is not true.


stanislowskij commented 1 month ago

I will leave the internal definition of into here in case it's useful for analyzing, finding more cases / a better way to approach this:

(defn into
  "Returns a new coll consisting of to with all of the items of
  from conjoined. A transducer may be supplied.
  (into x) returns x. (into) returns []."
  {:added "1.0"
   :static true}
  ([] [])
  ([to] to)
  ([to from]
     (if (instance? clojure.lang.IEditableCollection to)
       (with-meta (persistent! (reduce conj! (transient to) from)) (meta to))
       (reduce conj to from)))
  ([to xform from]
     (if (instance? clojure.lang.IEditableCollection to)
       (let [tm (meta to)
             rf (fn
                  ([coll] (-> (persistent! coll) (with-meta tm)))
                  ([coll v] (conj! coll v)))]
         (transduce xform rf (transient to) from))
       (transduce xform conj to from))))
stanislowskij commented 1 month ago

This is our spec on into as it sits right now:

(s/fdef clojure.core/into
  :args (s/and :babel.arity/zero-to-three
               (s/or :arg-one (s/cat :any (s/? any?))
                     :arg-two (s/cat :coll (s/nilable :babel.type/coll) :any any?)
                     :arg-three (s/cat :coll (s/nilable :babel.type/coll) :function :babel.type/function-or-lazy :coll any?))))
stanislowskij commented 1 month ago

@elenam Here are some discoveries I made today while diving down this rabbit hole, which may influence how we rewrite these specs going forward. There are two key invariants on conj and reduce respectively which categorize all of these examples into two "cases" of unexpected behavior.

What does (reduce conj to from) actually do?

Case I: conj has this invariant which produces interesting behavior when called with reduce.

From documentation:

conj[oin]. Returns a new collection with the xs 'added'. (conj nil item) returns (item). (conj coll) returns coll. (conj) returns []. The 'addition' may happen at different 'places' depending on the concrete type.

(conj nil item) $\rightarrow$ (list item), NOT (seq item) which is an important distinction.

Therefore, we can expect that:

(into nil x) = (reduce conj nil x) $\rightarrow$ (reduce conj (list (first x)) (drop 1 x))

i.e.,

(into nil "hello") $\rightarrow$ (reduce conj '(\h) "ello") $\rightarrow$ (reduce conj '(\e, \h) "llo") ... $\rightarrow$ (\o, \l, \l, \e, \h)


Case II: reduce also has its own interesting invariant.

From documentation:

f should be a function of 2 arguments. If val is not supplied, returns the result of applying f to the first 2 items in coll, then applying f to that result and the 3rd item, etc. If coll contains no items, f must accept no arguments as well, and reduce returns the result of calling f with no arguments. If coll has only 1 item, it is returned and f is not called. If val is supplied, returns the result of applying f to val and the first item in coll, then applying f to that result and the 2nd item, etc. If coll contains no items, returns val and f is not called.

It's not stated here, but I suspect that, by "contains no items," they actually mean that (count coll) returns 0, since using "" and nil in place of an empty collection works the exact same.

Also, this invariant applies to cases with two arguments, in which coll is effectively ignored.

i.e., define empty such that (any? empty) and (= 0 (count empty)). Then:

With one argument: (into empty) = (reduce conj empty) $\rightarrow$ (conj) $\rightarrow$ []

With two arguments: (into x empty) = (reduce conj x empty) $\rightarrow$ (conj x) $\rightarrow$ x

stanislowskij commented 1 month ago

One proposed idea for spec on into:

one-argument: any?

two-arguments: ((first: coll?) and (second: (seqable? and non-empty?))) or ((first: any?) and second: (seqable? and empty?))