deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Avoid printing to standard output #321

Open simleo opened 2 years ago

simleo commented 2 years ago

Several methods in the EDDL API write directly to standard output, so that library users have no control over what gets printed and when. Since this happens within each method's implementation, the Python bindings have no way to prevent it, thus Python library users are also affected. This is particularly inconvenient when using the library from a Jupyter notebook: in order to display output in notebook cells, Jupyter replaces sys.stdout with a custom ipykernel.iostream.OutStream object. However, sys.stdout is just a Python-level abstraction, that usually wraps the system's file descriptor for stdout (but not in this case), while the underlying C++ library is writing to the real stdout, so its output ends up in the terminal window where Jupyter was launched (and gets mixed up with logging output from Jupyter itself). This affects several kinds of methods, most notably:

In the case of info prints, the amount of text is relatively small and gets printed all at once, so it should probably be enough to change these methods to return a string containing the informational dump.

The case of methods that periodically report progress is a bit more complicated. One thing that should work fine is to add support for a callback function that would be passed the relevant info at each tick. For instance, this is what Python's standard library does in urlretrieve, a tool for downloading remote files:

def urlretrieve(url, filename=None, reporthook=None, data=None):
    # [...]

reporthook expects a function that takes three arguments: a block count, a block size, and the total file size. For instance, the user could write the following function to pass as reporthook:

def print_progress(count, block_size, total_size):
    perc = 100 * count * block_size / total_size
    sys.stdout.write("\r>> Getting %s %.1f%%" % (tar_name, perc))
    sys.stdout.flush()

Each time urlretrieve wants to report on its progress, it calls the print_progress function specified by the user, which now has complete freedom of what to do with the information: in this case it is simply dumped to sys.stdout, but it could be used to populate a database or update a chart instead. Even the simple example shown above should be enough to fix the Jupyter problem though.

The relevant C++ methods should be modified to take extra optional std::function arguments, which are already used in newloss and newmetric and thus should be OK to bind in Python.

salvacarrion commented 2 years ago

Thank you! We were aware of this, and in fact, we've changed quite a few methods. However, we won't deal with all these prints in the ongoing release.