cnuernber / dtype-next

A Clojure library designed to aid in the implementation of high performance algorithms and systems.
Other
328 stars 20 forks source link

replacing & args with options maps #82

Closed daslu closed 1 year ago

daslu commented 1 year ago

Following the recent Zulip conversation about the tensor api, this issue is proposing to generally prefer options maps over a variable number of arguments (& args).

Except for making things more consistent, having a single argument may arguably make things more composable (when passing around options maps, etc.).

This applies to a few functions at the tech.v3.datatype.functional and tech.v3.tensor namespaces, and probably a few others.

@harold said:

Agree here that a single options map is preferable to varargs. There may be some reason why varargs are necessary in some case, but probably not, and making this more consistent is a good idea.

cnuernber commented 1 year ago

Could we get a list of functions here so we don't miss one when we do the work?

daslu commented 1 year ago

I'll look and create a list :+1:

daslu commented 1 year ago

Hi. As a first step, below are all the cases of varargs in functions and macros in the official API namespaces (those appearing under the :codox alias in deps.edn).

I will review them to determine which ones are relevant here (cases of options maps as varargs). (Examples for other uses of varargs: macros typically have & body varargs, and arithmetic functions typically support varying numbers of arguments (e.g., + has [x y & args])).

In the automated analysis below, I omitted a few namespaces because I could not require them in my local environment. But I checked them manually, and they did not introduce any additional cases.

(require '[tech.v3.dataset :as tmd]
         '[tech.v3.dataset.print :as print])

(def relevant-namespace-symbols
  (->> "deps.edn"
       slurp
       read-string
       :aliases
       :codox
       :exec-args
       :namespaces
       (filter (complement #{'tech.v3.datatype.jna
                             'tech.v3.datatype.char-input
                             'tech.v3.libs.neanderthal}))
       sort))

(apply require relevant-namespace-symbols)

(-> relevant-namespace-symbols
    (->> (mapcat (fn [namespace-symbol]
                   (->> namespace-symbol
                        the-ns
                        ns-publics
                        vals
                        (filter ifn?)
                        (sort-by str)
                        (mapcat (fn [f]
                                  (let [m (meta f)]
                                    (->> m
                                         :arglists
                                         (map (fn [arglist]
                                                {:ns namespace-symbol
                                                 :name (:name m)
                                                 :arglist arglist}))))))
                        (filter (fn [{:keys [arglist]}]
                                  (->> arglist
                                       (some #(= % '&)))))))))
    tmd/->dataset
    (print/print-range :all))

_unnamed [59 3]:

:ns :name :arglist
tech.v3.datatype emap [map-fn res-dtype x y & args]
tech.v3.datatype.errors throw-index-out-of-boundsf [msg & args]
tech.v3.datatype.errors throw-unimplemented [& _args]
tech.v3.datatype.errors throwf [message & args]
tech.v3.datatype.errors when-not-errorf [expr error-msg & args]
tech.v3.datatype.ffi c->string [data & [encoding]]
tech.v3.datatype.ffi define-foreign-interface [rettype argtypes & [{:as options}]]
tech.v3.datatype.ffi define-library-interface [fn-defs
&
{:keys [_classname check-error symbols libraries _header-files], :as opts}]
tech.v3.datatype.ffi instantiate-class [cls & args]
tech.v3.datatype.ffi string->c [str-data & [{:keys [encoding], :as options}]]
tech.v3.datatype.ffi.size-t with-size-t-size [new-size & body]
tech.v3.datatype.functional * [x y & args]
tech.v3.datatype.functional + [x y & args]
tech.v3.datatype.functional - [x y & args]
tech.v3.datatype.functional / [x y & args]
tech.v3.datatype.functional atan2 [x y & args]
tech.v3.datatype.functional bit-and [x y & args]
tech.v3.datatype.functional bit-and-not [x y & args]
tech.v3.datatype.functional bit-clear [x y & args]
tech.v3.datatype.functional bit-flip [x y & args]
tech.v3.datatype.functional bit-or [x y & args]
tech.v3.datatype.functional bit-set [x y & args]
tech.v3.datatype.functional bit-shift-left [x y & args]
tech.v3.datatype.functional bit-shift-right [x y & args]
tech.v3.datatype.functional bit-xor [x y & args]
tech.v3.datatype.functional equals [lhs rhs & args]
tech.v3.datatype.functional hypot [x y & args]
tech.v3.datatype.functional ieee-remainder [x y & args]
tech.v3.datatype.functional max [x y & args]
tech.v3.datatype.functional min [x y & args]
tech.v3.datatype.functional pow [x y & args]
tech.v3.datatype.functional quartile-outlier-fn [item & args]
tech.v3.datatype.functional quot [x y & args]
tech.v3.datatype.functional rem [x y & args]
tech.v3.datatype.functional unsigned-bit-shift-right [x y & args]
tech.v3.datatype.gradient diff1d [data & [options]]
tech.v3.datatype.jvm-map foreach! [item op & [parallel?]]
tech.v3.datatype.statistics quartile-outlier-fn [item & [range-mult]]
tech.v3.libs.buffered-image as-ubyte-tensor [img & _options]
tech.v3.libs.buffered-image downsample-bilinear [src-img & {:keys [dst-img-width dst-img-height dst-img-type]}]
tech.v3.libs.buffered-image draw-image! [src-img
dst-image
&
{:keys
[src-x-offset
src-y-offset
src-rect-width
src-rect-height
dst-x-offset
dst-y-offset
dst-rect-width
dst-rect-height
interpolation-type]}]
tech.v3.parallel.for doiter [varname iterable & body]
tech.v3.parallel.for indexed-doiter [idxvarname varname iterable & body]
tech.v3.parallel.for parallel-for [idx-var num-iters & body]
tech.v3.parallel.for pmap [map-fn & sequences]
tech.v3.parallel.for upmap [map-fn & sequences]
tech.v3.parallel.queue-iter queue-fn [src-fn & [options]]
tech.v3.parallel.queue-iter queue-iter [iter & [options]]
tech.v3.tensor ->jvm [item & args]
tech.v3.tensor ->tensor [data & args]
tech.v3.tensor clone [tens & args]
tech.v3.tensor construct-tensor [buffer dimensions & args]
tech.v3.tensor mget [t x y z & args]
tech.v3.tensor mset! [t x y z w & args]
tech.v3.tensor new-tensor [shape & args]
tech.v3.tensor select [t & args]
tech.v3.tensor.color-gradients colorize [src-tens
gradient-name
&
{:keys
[data-min
data-max
alpha?
check-invalid?
invert-gradient?
gradient-default-n],
:or {gradient-default-n 200}}]
tech.v3.tensor.color-gradients colorize->clj [src-tens gradient-name & options]
tech.v3.tensor.dimensions select [dims & args]
daslu commented 1 year ago

Reviewing the cases above, here are the ones where varargs are used to pass optional arguments of some kind:

:ns :name :arglist
tech.v3.datatype.ffi c->string [data & [encoding]]
tech.v3.datatype.ffi define-foreign-interface [rettype argtypes & [{:as options}]]
tech.v3.datatype.ffi define-library-interface [fn-defs
&
{:keys [_classname check-error symbols libraries _header-files], :as opts}]
tech.v3.datatype.ffi string->c [str-data & [{:keys [encoding], :as options}]]
tech.v3.datatype.functional quartile-outlier-fn [item & args]
tech.v3.datatype.gradient diff1d [data & [options]]
tech.v3.datatype.jvm-map foreach! [item op & [parallel?]]
tech.v3.datatype.statistics quartile-outlier-fn [item & [range-mult]]
tech.v3.libs.buffered-image as-ubyte-tensor [img & _options]
tech.v3.libs.buffered-image downsample-bilinear [src-img & {:keys [dst-img-width dst-img-height dst-img-type]}]
tech.v3.libs.buffered-image draw-image! [src-img
dst-image
&
{:keys
[src-x-offset
src-y-offset
src-rect-width
src-rect-height
dst-x-offset
dst-y-offset
dst-rect-width
dst-rect-height
interpolation-type]}]
tech.v3.parallel.queue-iter queue-fn [src-fn & [options]]
tech.v3.parallel.queue-iter queue-iter [iter & [options]]
tech.v3.tensor ->jvm [item & args]
tech.v3.tensor ->tensor [data & args]
tech.v3.tensor clone [tens & args]
tech.v3.tensor construct-tensor [buffer dimensions & args]
tech.v3.tensor new-tensor [shape & args]
tech.v3.tensor select [t & args]
tech.v3.tensor.color-gradients colorize [src-tens
gradient-name
&
{:keys
[data-min
data-max
alpha?
check-invalid?
invert-gradient?
gradient-default-n],
:or {gradient-default-n 200}}]
tech.v3.tensor.color-gradients colorize->clj [src-tens gradient-name & options]
cnuernber commented 1 year ago

Filtering this further to exclude functions that do in fact take option maps as an optional argument but are destructuring it (which is a performance issue but not what we are fixing here) in the declaration.

I noticed also that there are namespaces that should be removed as they didn't provide useful.

My strategy is going to make the project as correct as possible and either not add deprecated namespaces or do so only when someone really complains as they are actually difficult to do correctly and increase the code size of the finished library.

cnuernber commented 1 year ago

Fixed in changelist #

Cortys commented 1 year ago

Sorry for posting on an already closed issue, but I just saw the recent commit for this and was wondering whether a breaking change is necessary here.

Since Clojure 1.11, functions with keyword varargs accept a single options map in addition to the non-wrapped notation, the current dtype API should therefore already support both invokation patterns for most functions:

(require '[tech.v3.tensor :as dtt])
(dtt/new-tensor [10 10] :datatype :int8) ; Works
(dtt/new-tensor [10 10] {:datatype :int8}) ; Also already works (using Clojure >= 1.11)

By using (fn [& {:keys [...]}]) over (fn [& [{:keys [...]}]]) everywhere, both invokation patterns would be supported with fewer breaking changes.

cnuernber commented 1 year ago

This is a very good point - I think worth it to try first.