fhanau / Efficient-Compression-Tool

Fast and effective C++ file optimizer
Apache License 2.0
596 stars 41 forks source link

Add progress information #115

Open K0-RR opened 2 years ago

K0-RR commented 2 years ago

I'm currently compressing about 2500 images. There is no ETA, % of completion. I have no idea if it's gonna take a few hours or many days. Please add at least something like this Processed 135 (100 MB) out of 500 files (900 MB)

fhanau commented 2 years ago

This is similar to your other issue, but I think it may be out of scope. Being able to display progress would require a careful design so it works well for a number of different use cases. I consider the folder support of ECT to be a convenience feature and not its main function, so adding progress information to see how long it takes to compress one big folder is not what I plan to focus on. If you are using ECT on larger amounts of data, I would recommend to use a script to process files one at a time, which allows you to print progress or any additional information as you go through the folder and makes it more flexible. When compressing larger amounts of data with a high compresssion setting, the run time can certainly be high, so you can run the program on a subset of the files or with a lower setting (e.g. -3) first to get a sense of how long it might take.

Anasxrt commented 1 year ago

I use xargs to help,

find . -type f -name "*.png" -o -name "*.PNG" -o -name "*.jpg" -o -name "*.JPG" | xargs -n1 ect -3 -recurse -strip -keep --mt-file

but I have not figure out yet, to show the file names in line before optimize result output

taskhawk commented 9 months ago

I use this:

find . -type f -iname "*.png" -o -iname "*.jpg" -print0 | sort -z | xargs -0r -n 1 -P $(nproc) -I '{}' sh -c 'OUT=$(ect -3 -strip -keep "$1" | grep -v "^Processed") && echo "$1  -->  $OUT" || echo "Error processing $1"' sh '{}'

Files are delimited by a null character to avoid any funny business with weird characters in filenames. nproc returns the number of logical processors in your system to maximize performance, adjust as necessary. From my short testing it appears it's more efficient to run more single-threaded processes than fewer multi-threaded processes. I sort the results of find to help give a sense of progress but mainly so it can help me locate approximately which file caused issues (with an error margin +/- of the value used for -P) in case the error isn't catched by || echo "Error processing $1". The output for each file is converted to a single line because the multiple processes running at the same time could mix their multi-line output making it difficult to know to which result corresponds to which file.

However, I find the command a monstrosity and I think it would be better if ect just showed what file(s) it was working on.