Textualize / rich-cli

Rich-cli is a command line toolbox for fancy output in the terminal
https://www.textualize.io
MIT License
3.04k stars 75 forks source link

head / tail operations are slow on larger files #64

Closed jamestexas closed 4 months ago

jamestexas commented 2 years ago

Howdy -

I wanted to preface this with: If I missed a contributor guideline or anything, please let me know. I did check other issues and did not see one relevant to this.

I am somewhat new to using rich-cli (but am familiar with rich) and recently attempted to parse a somewhat large CSV file (~119Mb, 483k lines). I did not expect the whole CSV to load quickly, but I was somewhat surprised that running --head and --tail took as long as they did. Obviously they won't behave like GNU tail / head, but I took a jab at a minimal / naive change to this and was able to get it much faster. It's around this here if you want I am happy to open a PR. I'll also just put a code block of what I did. I did take the somewhat naive approach to file parsing (rather than parsing the buffer stream per line, which would be more efficient for tail) to avoid making a huge change.


    rows = iter(reader)
    if has_header:
        header = next(rows)
        for column in header:
            table.add_column(column)

    if head is not None:
        table_rows = list(
            filter(
                None,
                (next(rows, None) for _ in range(head)),
            )
        )

    elif tail is not None:
        table_rows = deque(rows, tail)

    else:
        table_rows = list(rows)

These are naive benchmarks, but comparing the two (where rich command is the install CLI, and python3 ./src/rich_cli having my changes:

Head

└> time python3 ./src/rich_cli --head 500 large_csv.csv &> /dev/null                                           [👾 3.10.5]➜
python3 ./src/rich_cli --head 500 large_csv.csv &> /dev/null  0.83s user 0.47s system 94% cpu 1.369 total

└> time rich --head 500 large_csv.csv &> /dev/null                                                             [👾 3.10.5]➜
rich --head 500 large_csv.csv &> /dev/null  2.81s user 0.60s system 99% cpu 3.443 total

Tail

└> time rich --tail 500 large_csv.csv &> /dev/null                                                             [👾 3.10.5]➜
rich --tail 500 large_csv.csv &> /dev/null  2.95s user 0.63s system 99% cpu 3.604 total

└> time python3 ./src/rich_cli --tail 500 large_csv.csv &> /dev/null                                           [👾 3.10.5]➜
python3 ./src/rich_cli --tail 500 large_csv.csv &> /dev/null  1.93s user 0.53s system 96% cpu 2.545 total

Anyway, let me know if you want me to do anything here!