clj-python / libpython-clj

Python bindings for Clojure
Eclipse Public License 2.0
1.05k stars 68 forks source link

tqdm not printing #216

Closed behrica closed 1 year ago

behrica commented 1 year ago

Doing this

(py/run-simple-string "from tqdm import tqdm")
(py/run-simple-string "from time import sleep")
(def s "with tqdm(total=100) as pbar:\n    for i in range(10):\n        sleep(1)\n        pbar.update(10)")
(py/run-simple-string s)

does not print anything in the repl.

jjtolton commented 1 year ago

What is tqdm?

behrica commented 1 year ago

https://github.com/tqdm/tqdm

Nearly every python library is using it for progress report.

behrica commented 1 year ago

I am not sure, if something can be done on te level of libpython-clj I think tqdm tries to detect of it has a "tty" or not.

behrica commented 1 year ago

I think the "root cause" of the non printing of tqdm in libpython-clj is that stderr is not a tty:

(py/run-simple-string "import sys;print(sys.stderr.isatty())")
-> false

while in a python shell:

>>> sys.stderr.isatty()
True
behrica commented 1 year ago

There is something strange ongoing. the code here: https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L37

Seems to set the isatty() for both (out and err) to false.

But verifying it via "run-simple-string", shows that err -> false, and stdin -> true

jjtolton commented 1 year ago

Interesting. Well, that is technically correct, isn't it? Repl work with libpython-clj is not tty. Or is this for scripting?

behrica commented 1 year ago

A python interpreter is a tty. So I would expect the same for the python process started via libpython-clj running in a REPL.

behrica commented 1 year ago

I get printing by patching:

https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L34

to

 "flush" (py-class/make-tuple-instance-fn
              (fn [self] (.flush *err*)))
behrica commented 1 year ago

This is my complete "test code", so we know what we talk about: (run in a "fresh" repl)

  (require '[libpython-clj2.python :as py])
  (py/initialize!)
  (def _ (py/run-simple-string "from tqdm import tqdm"))
  (def _ (py/run-simple-string "from time import sleep"))
  (def _ (py/run-simple-string "with tqdm(total=100) as pbar:\n    for i in range(10):\n        sleep(1)\n        pbar.update(10)"))
behrica commented 1 year ago

This code should print something (in realtime(, so every second for 10 seconds, this is behavoiur in teh python interpreter. But it does not in current `libpython-clj.

behrica commented 1 year ago

I can "trigger" the printing manually, by "doing something" after th above lines, which make Clojure print someting to "err" or "flush it". (it will not print every second, but all togerther at the end)

 (require '[libpython-clj2.python :as py])
  (py/initialize!)
  (def _ (py/run-simple-string "from tqdm import tqdm"))
  (def _ (py/run-simple-string "from time import sleep"))
  (def _ (py/run-simple-string "with tqdm(total=100) as pbar:\n    for i in range(10):\n        sleep(1)\n        pbar.update(10)"))
  (.flush *err*)

Doing this a pressing "return" in the repl !!! afterwards will the print the full progress in one go.

-> there is flushing problem.

behrica commented 1 year ago

Changing here: https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L34

which does "nothing" on flush into:

 "flush" (py-class/make-tuple-instance-fn
              (fn [self] (.flush *err*)))

so force teh flushing of err results in a fully working progress bar, (so it prints every second for 10 seconds) (the isatty being false or true in our code seem not make a dfference)

behrica commented 1 year ago

We have teh same issue with stdout. This

 (def _ (py/run-simple-string "for i in range(10):\n        sleep(1)\n        print( 1)"))

does neither print something every second. It does so at the end (all is printed after 10 seconds)

behrica commented 1 year ago

Doing the same fix for out (as above), makes this working:

(def _ (py/run-simple-string "for i in range(10):\n        sleep(1)\n        print( 1, flush=True)"))

but not

(def _ (py/run-simple-string "for i in range(10):\n        sleep(1)\n        print( 1,)"))

Better said: In teh python interpreter stdoud seem to be se to "auto-flush" or similar.

(but without the above patch, neither "stdout" nor "stderr" can be flushed from python code. (only from acting on the Java streams)

behrica commented 1 year ago

So for me it seems clearly wrong to implement "flush" empty in here: https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L34

behrica commented 1 year ago

tqdm checks "isatty" only if called with tqdm(disable=None) (vs True, False) This is documented: https://github.com/tqdm/tqdm/blob/87d253f65621884c9a4020fecabc7824029e2358/tqdm/std.py#L901

and I could verify it by playing with the isatty implementaion of teh io-redirection.

So think we need to do even these 2 changes:

  1. Implement flush with "flushing" here: https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L34 (via Java/Clojure call equivalent to .flush of teh relevant stream (err / out)
  2. implement isatty as "true" in: https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L36
behrica commented 1 year ago

If we wanted to fully mimic the python interpreter we should maybe implement (otionaly) "flush after each write" in https://github.com/clj-python/libpython-clj/blob/11140d6d6b296da34a7d91ba3212e2323644452f/src/libpython_clj2/python/io_redirect.clj#L26

behrica commented 1 year ago

I foundsome comments on "python autoflush":

` In fact this is the default behavior of the underlying stdio functions.

When the output is to console, the stream will be automatically flushed when a newline is encountered, but not other characters.

If the output is not a console, then even newline won't trigger a flush.

If you want to make sure about flush, you can tell the print() explicitly: `

behrica commented 1 year ago

The problem can be reproduced without tqdm. The following does not print:

(py/run-simple-string "for i in range(10):\n        sleep(1)\n        sys.stderr.write('#')\n        sys.stdout.flush()")

So even the enforcing of flushing does not make it work. (unless I call (.flush *err*) afterwards.)

jjtolton commented 1 year ago

Great research! Sounds like you have most of a PR ready to go for this! Would be interested to take a look and test it.

behrica commented 1 year ago

Yes, but it would be very useful if @cnuernber could comment why it was implemented as it is. I can definitely say, that the change would fix the issue above, but I have no idea, if there can be any downsides.

Because then we needed to to make this controllable by the user, when he starts up libpython-clj.

The python interpreter has for example a "-u" setting for this.

I want to add as well that the behavior above is different in "embedded mode". There the flushing does happen.

cnuernber commented 1 year ago

istty and flush were stubbed out more or less to get tensorflow to work. I have no opinion on the best path here - I agree a PR would be best.

behrica commented 1 year ago

Do you have some example code using tensorflow somewhere ? So I can try, if it still works after change.

cnuernber commented 1 year ago

If I remember right it was just '(import-module 'tf')`. In any case, lets get a PR and I can work with it.

behrica commented 1 year ago

In my view , this behaves as well wrong:

(py/run-simple-string "print('ok');sys.stdout.flush();sleep(10)")

It prints only after 10 second, because then by "coincidence", the JVM flushes the output stream, because the REPL prints something after every command executed. So "python by itself" cannot flush the streams out and err

cnuernber commented 1 year ago

Ah - right - got it. I have flush added into the class now - do we need to change the istty implementation?

behrica commented 1 year ago

good question.

In my tests I did not make a difference. The python definition says:

Python file method isatty() returns True if the file is connected (is associated with a terminal device) to a tty(-like) device, else False.

behrica commented 1 year ago

So either Java has something like that, which I don't think, it's the case, Or we assume "Clojure runs mostly in REPL"... --> true

and make it configurable

behrica commented 1 year ago

There is (System/console), not exactly the same but could be used:

https://stackoverflow.com/questions/1403772/how-can-i-check-if-a-java-programs-input-output-streams-are-connected-to-a-term

cnuernber commented 1 year ago

OK, what I have right now is that I just implemented flush. I recommend we leave it there.