boot-clj / boot

Build tooling for Clojure.
https://boot-clj.github.io/
Eclipse Public License 1.0
1.75k stars 179 forks source link

`println` output during lazy-seq realization in `with-eval-in` corrupts return value #683

Closed martinklepsch closed 5 years ago

martinklepsch commented 6 years ago
#!/usr/bin/env boot

(set-env! :dependencies '[[org.slf4j/slf4j-nop "1.7.25"]])

(require 'boot.pod)

(defn expected-result? [data]
  (println "Testing with data" data "\n")
  (let [expected   (map #(do (println "Found a match") %) data)
        pod-result (boot.pod/with-eval-in (boot.pod/make-pod {})
                     (map #(do (println "Found a match") %) ~data))]
    (when-not (= expected pod-result)
      (println "EXPECTED:   " expected)
      (println "POD RESULT: " pod-result))
    (= expected pod-result)))

(if (expected-result? [{:name "match-beginning"}])
  (do (println "Success, values are equal") (System/exit 0))
  (do (println "Failure, values are different") (System/exit 1)))

To reproduce: copy the above into a file repro, make it executable chmod +x repro, run it ./repro.

I reproduced this issue with 2.6.0, 2.7.2 and 2.8.0-SNAPSHOT.

EDIT updated with more minimal repro. The key thing seems to be that something is printed during iteration over a sequence.

martinklepsch commented 6 years ago

FWIW this seems to fix the issue, not sure why yet though (note the double pr-str)

diff --git a/boot/pod/src/boot/pod.clj b/boot/pod/src/boot/pod.clj
index b4f58f6..6903737 100644
--- a/boot/pod/src/boot/pod.clj
+++ b/boot/pod/src/boot/pod.clj
@@ -465,7 +473,9 @@
   ([expr]
      (let [{:keys [meta? expr]} (read-string expr)]
        (binding [*print-meta* meta?]
-         (pr-str (eval expr)))))
+         (let [r (eval expr)]
+           (pr-str r)
+           (pr-str r)))))
   ([pod expr]
      (let [arg (pr-str {:meta? *print-meta* :expr expr})
            ret (with-invoke-in pod (boot.pod/eval-in* arg))]
martinklepsch commented 6 years ago

Laziness seems to be at fault here. Using mapv in the example below solves the issue but this doesn't really help when lazy seqs are nested deep inside maps etc.

(->> (range 30)
     (map #(do (println %) %))
     (pr-str))
martinklepsch commented 6 years ago
(->> (range 30)
     (map #(do (println %) %))
     (clojure.walk/prewalk identity)
     (pr-str))

This works — does anyone see issues with this approach?

alandipert commented 6 years ago

@martinklepsch weird case but makes sense to me. Lazy things need to be fully realized 👍

martinklepsch commented 6 years ago

The example / test case is a bit contrived but I was working with Codox and it prints whenever it excludes a var from the documentation. Generation of the data that drives the documentation happens lazily. The boot-codox task is not affected by this because it never returns any data but just writes files.

arichiardi commented 6 years ago

Ok now that I understand the problem better I think I have one other idea.

I agree lazy things should be realized for side effects to happen but do we want to modify existing behavior?

We could instead expose a new API like do-all-in or something for the pod and this includes the change in behavior of the PR.

martinklepsch commented 6 years ago

I agree that this is strictly speaking a breaking change but pr-str also realizes the entire data. So you could say we only move that realization step out of pr-str. Can you think of cases where the two variants would give different results?

martinklepsch commented 6 years ago

Do we think people depend on intermixing of strings printed during the realization of the seq? FWIW I'm hesistant to add another function to that ns because:

hyrums-law

arichiardi commented 6 years ago

So if pr-str already realizes everything there wouldn't be any problem right?

I am wondering what is the missing case, what pr-str does not realize and in case we identify the why maybe there is a better approach. I am just afraid we will break things that we don't expect to break...but well..maybe I am too on the defending/scared side :smile:

martinklepsch commented 6 years ago

So if pr-str already realizes everything there wouldn't be any problem right?

pr-str realizes everything by recursively walking the data printing item by item. If the generation of any next items in a lazy seq cause printlns those will be intermixed with the printed stuff of pr.

If pr-str would not realize the the entire data it would not work very well 😛

arichiardi commented 6 years ago

Ok it seems like I cannot understand something about the issue still, or maybe I don't see it as boot issue per se:

(pr-str (eval '(map #(do (println (str "test" %) %)) (range 0 6))))

And it prints out the testX within the returned vector. Is this the problem? You want to receive all the println first and then the list of results?

The alternative (or mapv as you said):

(pr-str (eval '(doall (map #(do (println (str "test" %)) %) (range 0 6)))))

Achieves that. Maybe the debate in my own head is something else: if we did that change users won't be allowed to control the behavior of lazy seqs anymore. You could not do my case number one basically.

Is case 1 common? For sure it is the existing behavior. Do we want to enforce 2 behavior? I don't actually know :smile: I would be cautious and introduce another API, however I will yield to the majority if the change seems innocuous enough.

martinklepsch commented 6 years ago

And it prints out the testX within the returned vector. Is this the problem?

Yes

You want to receive all the println first and then the list of results?

I personally don't care about execution order, I do care about getting the unmodified return value of my functions though 🙂

If I understand you correctly: Users that rely on case 1 rely on the fact that pr-str will mix values with printed stuff. I don't see any reason why you would realistically want that but you are absolutely right, if people rely on this it is a breaking change.

rauhs commented 6 years ago

Saw the comment on slack @martinklepsch . My two cents: Is this performance critical? A super easy hack is to just pr-str twice and take the second one. Edit: Ooops, I now see that this was already considered. Excuse the noise :)

martinklepsch commented 6 years ago

@rauhs That's the first "solution" I found but I feel like clojure.walk is fundamentally doing the same but without actually turning things into strings which is probably faster? Perf is desirable if possible.

minikomi commented 6 years ago

This definitely seems like what doall was created to avoid.

martinklepsch commented 6 years ago

@minikomi IIRC the issue with doall is that it doesn't recursively walk structures. That means calling doall on a map won't fully realize lazy sequences which are part of the map vals.

burn2delete commented 5 years ago

@martinklepsh do you have a patch for this?

martinklepsch commented 5 years ago

@flyboarder not ready but I might be able to provide one later.

The gist of it is that we need to call (clojure.walk/prewalk identity (eval expr)) in the snippet mentioned in this comment: https://github.com/boot-clj/boot/issues/683#issuecomment-357938537

By fully realising all nested lazy sequences we ensure that no side effecting (println) functions mapped over those lists affect the output of the pr-str that is called later.