Normalize identifier names

bobbicodes commented 2 years ago

Experimented using rewrite-clj to apply normalizations by editing the solution's parse tree.

To find the names and positional locations of the names to be normalized, we use the analysis data generated by clj-kondo:

(def identifiers
  (let [cmd ["./clj-kondo" "--lint" "./resources/example.clj" "--config"
             "{:output {:format :edn},:analysis {:locals true}}"]]
    (:locals (:analysis (edn/read-string (:out (apply shell/sh cmd)))))))

This returns a sequence of maps each containing the identifier name, row and column.

Note: Accurately determining precisely which names refer to identifiers which should be normalized (as opposed to function names, etc. which would break the solution if changed) is no small task due to the various ways that functions can be defined in Clojure. To appreciate the value provided by the clj-kondo analyzer, check out the source

To find these positions in the parse tree, we need to enable position tracking when creating the zipper:

(def impl
  (z/of-file "./resources/example.clj" {:track-position? true}))

In this example, the first name to be replaced happens to be 'xs at position [3 20]. To replace the node with a string value of "Placeholder1":

(z/replace (z/find-last-by-pos impl [3 20]) "Placeholder1")

The entire solution can be normalized likewise by iterating through the sequence of identifiers.

bobbicodes commented 2 years ago

After playing around with this for another day and getting funny results I think I realized the problem with this approach - once an edit is made to the parse tree at the desired position, it changes the positions of the rest of the names to be replaced!

So instead we need another strategy to locate the identifiers besides by their positions.

We also need to ensure that the name being replaced is indeed a local defined by the user, and not the name of a core function such as clojure.core/name, or appearing inside a string or something.

I contrived a test function containing these edge cases:

(ns test-code)

;; function using "name" as a parameter
;; also containing "name" inside a string
(defn with-name [name color]
  (let [my-fn (fn inline-fn [name2 color2]
                (str name2 " " color2))]
    (str "My name is " name
         " and my color is " color
         "and my-fn is " (my-fn "joe" "blue"))))

;; snippet using clojure.core/name
(map name (keys {:my-key "myval"}))

We need to replace only the 2 usages of "name", the first one appearing in the arglist of with-name and where it is referenced in the line (str "My name is " name, and ensure that everywhere else "name" appears is left alone.

bobbicodes commented 2 years ago

Fortunately, the clj-kondo analyzer knows the difference. However we also need to set :arglists to true in the config to include them in the results:

(def analysis
  (let [cmd ["./clj-kondo" "--lint" (str in-dir (snake-case slug) ".clj") "--config"
             "{:output {:format :edn},:analysis {:locals true :arglists true}}"]]
    (:analysis (edn/read-string (:out (apply shell/sh cmd))))))

These appear under :var-definitions:

(:var-definitions analysis)

Output:

[{:fixed-arities #{2},
  :end-row 10,
  :name-end-col 16,
  :name-end-row 5,
  :name-row 5,
  :ns test-code,
  :name with-name,
  :defined-by clojure.core/defn,
  :filename "./resources/test_code.clj",
  :col 1,
  :name-col 7,
  :end-col 49,
  :arglist-strs ["[name color]"],
  :row 5}]

bobbicodes commented 2 years ago

The :name key tells us which var definition we need to navigate to so we can edit the vector containing the arg to be replaced, without needing to use position tracking!

Let's also make it a bit more complex by making it a multi-arity function to make sure we handle that too:

(defn with-name
  ([name]
   (with-name name "white"))
  ([name color]
   (let [my-fn (fn inline-fn [name2 color2]
                 (str name2 " " color2))]
     (str "My name is " name
          " and my color is " color
          " and my-fn is " (my-fn "joe" "blue")))))

;; snippet using clojure.core/name
(map name (keys {:my-key "myval"}))

bobbicodes commented 2 years ago

Extract the list of args to replace, navigate to the first var definition in the zipper representing our parsed code, then to the next node containing the value of the first arg, and replace that node with a new node representing the symbol PLACEHOLDER1:

(let [var-def (first (:var-definitions analysis))
      args (mapcat edn/read-string (:arglist-strs var-def))]
  (-> impl
      (z/find-value z/next (:name var-def))
      (z/find-value z/next (first args))
      (z/replace (symbol "PLACEHOLDER1"))
      z/root-string
      println))

Output:

(ns test-code)

;; function using "name" as a parameter
;; also containing "name" inside a string
(defn with-name
  ([PLACEHOLDER1]
   (with-name name "white"))
  ([name color]
   (let [my-fn (fn inline-fn [name2 color2]
                 (str name2 " " color2))]
     (str "My name is " name
          " and my color is " color
          " and my-fn is " (my-fn "joe" "blue")))))

;; snippet using clojure.core/name
(map name (keys {:my-key "myval"}))

Note: The z/next function traverses the tree in depth-first order.

So far so good! Now we just need to do that for the rest of the args.

bobbicodes commented 2 years ago

Generate the map of placeholder names and output mapping.json:

(def placeholders
  (let [args (->> analysis
                  :var-definitions
                  (mapcat :arglist-strs)
                  (mapcat edn/read-string)
                  (map str)
                  set)
        placeholders (map #(str "PLACEHOLDER-" %) 
                          (range 1 (inc (count args))))]
    (spit (str out-dir "mapping.json") 
          (json/generate-string 
           (into (sorted-map-by 
                  (fn [key1 key2]
                    (< (parse-long (last (str/split key1 #"-")))
                       (parse-long (last (str/split key2 #"-"))))))
                 (zipmap placeholders args))
           {:pretty true}))
    (zipmap args placeholders)))

Output:

{"name" "PLACEHOLDER-1", "color" "PLACEHOLDER-2"}

mapping.json:

{
  "PLACEHOLDER-1" : "name",
  "PLACEHOLDER-2" : "color"
}

Note: The map entries are swapped in the file/definition for ease of lookup.

bobbicodes commented 2 years ago

Probably about time to check in the code written so far: #24

bobbicodes commented 1 year ago

Fixed by: #27

Closing this because we ended up using a totally different approach by modifying the uniquify pass of clojure.tools.analyzer to change the names to PLACEHOLDER-1, etc. instead of what the emit-hygienic-form normally does:

This still leaves many top-level vars in need of normalization however, since intermediate functions (which are arbitrarily named, e.g. for exponentiation - exp, exp, pow, etc. are so common that not replacing these leaves the representer rather ineffective.

This will be dealt with in a new issue.

exercism / clojure-representer

Normalize identifier names #23