Textualize / rich

Rich is a Python library for rich text and beautiful formatting in the terminal.
https://rich.readthedocs.io/en/latest/
MIT License
48.81k stars 1.71k forks source link

progress bar(s) and progress_hook(s) #975

Closed inktrap closed 3 years ago

inktrap commented 3 years ago

Hi there!

Sorry in advance if this seems like a novel …

First I would like to say that I am really happy to have found rich :) it seems like an easy way to make beautiful applications. I was particularly impressed with the non-flickering progress bars. I saw that you had quite some issues how they can be used in a multiprocessing/threaded setup (#146 #121) and that you provided excellent feedback/explanations there.

My issue is the following: I want to use youtube-dl in a threaded way (with huey, a task queue). Youtube-dl only permits one or more progress_hooks that are executed as a callback that is given a dictionary with progress info.

So I get that I have to update the main progress object and not some copy given in a function. I get that I can append messages to a queue and that I can recieve them; also that I can write stuff to a db and get them back: however I would like to do this without a db, because progress info seems so … impermanent?

I also can't seem to set a filename or the total number of (estimated) bytes, which would be necessary, because that that may very well change during the download (e.g. video file was downloaded, audio is downloaded separately, … or the estimates is updated).

However, I tried to adapt your downloader.py example, but it doesn't work. I'll append my example, but keep in mind that it is only some proof of concept, not clean code I intend to run like that.

This might be another issue, but I thought I'll append this here, because they are somehow related and I don't want to bother you with multiple issues. I would like to have sortable progress bars, e.g. by eta or total file size. I get that rich is only for presentation and that you don't want to include code for e.g. sorting tables. However, if I would implement the rich protocol for a Progress data type that is built after the info is e.g. pulled from a db, wouldn't I have to implement some busy-waiting loop and get flickering while I would get a sortable and table?!

inktrap commented 3 years ago
#!/usr/bin/env python3
"""
A rudimentary URL downloader (like wget or curl) to demonstrate Rich progress bars.
"""

from concurrent.futures import ThreadPoolExecutor
from functools import partial
import os.path
import sys
from typing import Iterable
from urllib.request import urlopen
from multiprocessing import Queue
import random
import time

import functools
import youtube_dl

from rich.progress import (
    BarColumn,
    DownloadColumn,
    TextColumn,
    TransferSpeedColumn,
    TimeRemainingColumn,
    Progress,
    TaskID,
)

progress = Progress(
    TextColumn("[bold blue]{task.fields[filename]}", justify="right"),
    BarColumn(bar_width=None),
    "[progress.percentage]{task.percentage:>3.1f}%",
    "•",
    DownloadColumn(),
    "•",
    TransferSpeedColumn(),
    "•",
    TimeRemainingColumn(),
)

# other idea would be to write this output into a db
# but that would lead to sortable, but flickering output
# on the other hand i want to use huey (a task queue) and could implement an
# update_status method for it like in celery, so I could do this db-less
def progress_hook(task_id: TaskID, d:dict) -> None:
    # logging.info(f"Progress hook: {d['filename']}")
    if d['status'] == "downloading":
        # logging.debug("progress_hook download iteration")
        if "total_bytes" in d:
            queue.put((task_id, 'total', d["total_bytes"]))
            # progress.update(task_id, total=d["total_bytes"])
        if "downloaded_bytes" in d:
            queue.put((task_id, 'advance', d['downloaded_bytes']))
            # progress.update(task_id, advance=d['downloaded_bytes'])
        if "filename" in d:
            queue.put((task_id, 'filename', d['filename']))
            # progress.update(task_id, filename=d['filename'])

def run_download(task_id: TaskID, url: str, opts={}) -> None:
    this_hook = functools.partial(progress_hook, task_id)
    opts = { "quiet": True, "progress_hook" : this_hook }
    with youtube_dl.YoutubeDL(opts) as dl:
        return dl.download([url])

if __name__ == "__main__":
    if sys.argv[1:]:
        urls = sys.argv[1:]
        queue = Queue()
        with progress:
            with ThreadPoolExecutor(max_workers=4) as pool:
                for url in urls:
                    # logging.debug("url iteration")
                    # print(url)
                    # filename = url.split("/")[-1]
                    # dest_path = os.path.join(dest_dir, filename)
                    task_id = progress.add_task("download", filename="", start=False)
                    queue.put((task_id, 'total', 1000))
                    progress.start_task(task_id)
                    # run_download(task_id, url, {})
                    this_status = pool.submit(run_download, task_id, url, {})
                    # print(this_status)
                # print(queue.empty())
                while not queue.empty():
                    # logging.debug("queue iteration")
                    result = queue.get_nowait()
                    # logging.debug(result)
                    progress.update_task(result[0], result[1], result[2])
    else:
        print("Usage:\n\tpython downloader.py URL1 URL2 URL3 (etc)")
willmcgugan commented 3 years ago

Hi @inktrap ,

I haven't ran your code, but I think the problem is that you have created a pool but haven't called anything with it. You would need at least call submit or map to run a function in the worker threads.

However, this may be a moot point because you won't need this approach when working with threads. Since threads share memory, you can call progress.update as normal in your 'progress hook' (no need for a queue or a thread pool).

Re sortable progress bars, Rich doesn't support this out of the box, but you can extend the Progress class and override this method. You can sort self.tasks there to change the order they are displayed.

Hope that helps.

inktrap commented 3 years ago

Sorry that I didn't write sooner, thanks for your initial answers; however I have additional questions/results ;)

I have a call to submit in there, see the ifmain block:

                    this_status = pool.submit(run_download, task_id, url, {})

Again, this is only your slightly changed downloader example.

Secondly, I tried to implement render tables of Task objects and I tried two variations:

a) add an additional method that sorts the internal task table, so i can use progress.add_task aso.

class ProgressList(Progress):

    def get_tasks_table(self, sort="filename") -> Table:
        # tasks = OrderedDict(sorted(self._tasks.items()))
        # tasks = self._tasks
        # print(self._tasks.items())
        tasks_items = [t[1] for t in self._tasks.items() ]
        tasks_keys = [t[1].id for t in self._tasks.items() ]

        # TODO: sort the tasks table, use something like ordered dict
        return tasks_keys, super().make_tasks_table(tasks_items)

I get recursion errors if I try to render the result of get_tasks_table() with rich.live. Are you interested in this?

b) construct a list of tasks by calling Task() directly and create a task id myself (seems like reinventing the wheel tbh). Doesn't seem like a good idea.

Sooo … what would be the best practice to construct a list of progress objects for make_tasks_table()?

willmcgugan commented 3 years ago

Try without the queue. You're not going to need it with threads.

I get recursion errors if I try to render the result of get_tasks_table() with rich.live. Are you interested in this?

Live and progress will not work in tandem atm.

As for sorting, it may be better to do it in get_renderables. Here is the current version:

    def get_renderables(self) -> Iterable[RenderableType]:
        """Get a number of renderables for the progress display."""
        table = self.make_tasks_table(self.tasks)
        yield table

You can change it to something like the following:

    def get_renderables(self) -> Iterable[RenderableType]:
        """Get a number of renderables for the progress display."""
        tasks = sorted(self.tasks, key=<YOUR KEY FUNCTION>)
        table = self.make_tasks_table(tasks)
        yield table

Concurrency is a tricky thing, even if you are experienced in it. I'd suggest you get it working perfectly with a single download, before trying to extend it to multiple downloads.

inktrap commented 3 years ago

Thanks for the quick reply! :)

Yes, I noticed get_renderables() before. I'll have a look at this during the next days and I'll try to come up with a non-queue, single-threaded, get_renderables example and then I'll try to use huey's multithreading capabilities. I'll post the results of both steps here, if that is alright with you?

Mmmh since get_renderables() is make_tasks_table(self.tasks) that answer my question wether I'll construct tasks on my own ;). However, how can I initialize Progress() so it won't show the bars/fields if I add tasks? I tried it with a specific console and/or disable: True. But then I'll get no output. I guess I would be glad about some documentation how to use get_renderables() and what to do beforehand.

nathangamz commented 1 year ago

Hey! Just wondering if there was any way to see what you came up with if you figured it out?