ericaltendorf / plotman

Chia plotting manager
Apache License 2.0
911 stars 280 forks source link

[REQ] Add percentage using log lines like Chia GUI #297

Open Fmstrat opened 3 years ago

Fmstrat commented 3 years ago

The Chia GUI app estimates percentages based on the number of lines in the log files. This has been detailed here: https://www.reddit.com/r/chia/comments/mx3iy1/gui_plotting_and_percent_done/

For a u=128 plot for instance, it totals out at 2626 lines in the log file.

With that in mind, you can calculate out the percent progress in bash:

wc -l /opt/chia/logs/2021-05-09-16:46:30.log |awk '{print $1}'); PROGRESS=$(bc <<< "scale=5;100/2626*$LINES"); echo $(printf %.$2f $PROGRESS

Since it's python, the code to mirror this could be pulled from the GUI: https://github.com/Chia-Network/chia-blockchain-gui/blob/main/src/modules/plotQueue.ts#L11

Specifically (with log being the text of the log):

const FINISHED_LOG_LINES = 2626; // 128
const lines = log.trim().split(/\r\n|\r|\n/).length;
progress = lines > FINISHED_LOG_LINES ? 1 : lines / FINISHED_LOG_LINES;

Could we get this same calculation into interactive and status modes?

Fmstrat commented 3 years ago

Well, that proved easy. The PR I submitted is working for me

altendky commented 3 years ago

Lines in the log doesn't strike me as a particularly interesting measure of completion. Is there a useful relationship between that and time through the plot? Even presuming that plots are progressing at a consistent rate from one to the next.

Fmstrat commented 3 years ago

@altendky The timing does seem mostly linear and in testing fairly accurate. I'm sure this is why the Chia team chose this method in the official chia-blockchain-gui

altendky commented 3 years ago

Interesting. I just did a quick glance and it seems like my jobs tend to spend nearly half their time in phase 1 which is only 800 of 2600 or so lines. It seems like this would depend on the thread count specified. But, perhaps it is enough to be a useful addition.

BasilHorowt commented 3 years ago

My jobs generally spend half their time in phase 1 too.

About this approach in general: what happens when the drive is full or disappears and it just keeps writing "retrying" lines? Won't it eventually incorrectly approach 100%? I would be quite dismissive of this idea except for the fact that the official gui is apparently doing it....

hackerzhuli commented 3 years ago

I think it is reasonable to get a percentage this way, using the phases:

in job.py:

class Phase:
...

    def get_percentage(self):
        if self.major == 1:
            return (self.minor - 1) / 7 * 0.45    
        elif self.major == 2:
            return (self.minor - 1) / 7 * 0.15 + 0.45    
        elif self.major == 3:
            return (self.minor - 1) / 6 * 0.35 + 0.60
        elif self.major == 4:
            return 0.95    
        return 0      

example log file from my computer, many lines are skipped for brevity:

2021-05-16T02:53:01.697  chia.plotting.create_plots       : INFO     Creating 1 plots of size 32, pool public key:  aa6a801dfb2df9306dde50f1fad5d71e05d2b095c2b7642a20658fdb9ceca379c3059b9022144aa31a49f108139ad207 farmer public key: af0b5de5e0810acb0a83d5caadf9626ae1c72a7491b1e1a7870a250f7fa0bc5e4ff7a2c7dd8edd58c7d80283468b5616
2021-05-16T02:53:01.706  chia.plotting.create_plots       : INFO     Memo: aa6a801dfb2df9306dde50f1fad5d71e05d2b095c2b7642a20658fdb9ceca379c3059b9022144aa31a49f108139ad207af0b5de5e0810acb0a83d5caadf9626ae1c72a7491b1e1a7870a250f7fa0bc5e4ff7a2c7dd8edd58c7d80283468b56165c325f765816a6b1f17e95614a6c547d1decefeb2a113d27c0ae7c7420a0e016
2021-05-16T02:53:01.706  chia.plotting.create_plots       : INFO     Starting plot 1/1

Starting plotting progress into temporary dirs: E:\Chia\Plots\Temp and E:\Chia\Plots\Temp
ID: 2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1
Plot size is: 32
Buffer size is: 4608MiB
Using 128 buckets
Using 2 threads of stripe size 65536

Starting phase 1/4: Forward Propagation into tmp files... Sun May 16 02:53:01 2021
Computing table 1
F1 complete, time: 287.472 seconds. CPU (68.05%) Sun May 16 02:57:49 2021
Computing table 2
...
Forward propagation table time: 1905.169 seconds. CPU (119.920%) Sun May 16 03:29:35 2021
Computing table 3
...
Forward propagation table time: 2567.561 seconds. CPU (99.850%) Sun May 16 04:12:22 2021
Computing table 4
...
Forward propagation table time: 3078.870 seconds. CPU (98.130%) Sun May 16 05:03:41 2021
Computing table 5
...
Forward propagation table time: 2923.353 seconds. CPU (100.560%) Sun May 16 05:52:25 2021
Computing table 6
...
Forward propagation table time: 2492.062 seconds. CPU (111.940%) Sun May 16 06:33:57 2021
Computing table 7
...
Forward propagation table time: 2043.967 seconds. CPU (118.820%) Sun May 16 07:08:01 2021
Time for phase 1 = 15299.114 seconds. CPU (106.040%) Sun May 16 07:08:01 2021

Starting phase 2/4: Backpropagation into tmp files... Sun May 16 07:08:01 2021
Backpropagating on table 7
scanned table 7
scanned time =  321.256 seconds. CPU (9.330%) Sun May 16 07:13:23 2021
sorting table 7
Backpropagating on table 6
scanned table 6
scanned time =  208.608 seconds. CPU (62.060%) Sun May 16 07:30:40 2021
sorting table 6
sort time =  793.295 seconds. CPU (63.270%) Sun May 16 07:43:54 2021
Backpropagating on table 5
scanned table 5
scanned time =  243.433 seconds. CPU (51.740%) Sun May 16 07:47:57 2021
sorting table 5
sort time =  779.925 seconds. CPU (62.280%) Sun May 16 08:00:57 2021
Backpropagating on table 4
scanned table 4
scanned time =  256.417 seconds. CPU (48.990%) Sun May 16 08:05:14 2021
sorting table 4
sort time =  763.123 seconds. CPU (62.100%) Sun May 16 08:17:57 2021
Backpropagating on table 3
scanned table 3
scanned time =  253.902 seconds. CPU (49.090%) Sun May 16 08:22:12 2021
sorting table 3
sort time =  773.777 seconds. CPU (61.430%) Sun May 16 08:35:06 2021
Backpropagating on table 2
scanned table 2
scanned time =  235.054 seconds. CPU (52.960%) Sun May 16 08:39:01 2021
sorting table 2
sort time =  750.129 seconds. CPU (62.420%) Sun May 16 08:51:31 2021
table 1 new size: 3425200882
Time for phase 2 = 6209.960 seconds. CPU (52.060%) Sun May 16 08:51:32 2021
Wrote: 268

Starting phase 3/4: Compression from tmp files into "E:\\Chia\\Plots\\Temp\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp" ... Sun May 16 08:51:32 2021
Compressing tables 1 and 2
...
Total compress table time: 2007.046 seconds. CPU (58.210%) Sun May 16 09:24:59 2021
Compressing tables 2 and 3
...

Total compress table time: 2178.968 seconds. CPU (62.550%) Sun May 16 10:01:18 2021
Compressing tables 3 and 4
...
Total compress table time: 2147.227 seconds. CPU (63.660%) Sun May 16 10:37:05 2021

Compressing tables 4 and 5
...
Total compress table time: 2174.614 seconds. CPU (63.850%) Sun May 16 11:13:20 2021

Compressing tables 5 and 6

Total compress table time: 2240.648 seconds. CPU (64.680%) Sun May 16 11:50:41 2021
Compressing tables 6 and 7

Total compress table time: 2357.186 seconds. CPU (49.820%) Sun May 16 12:29:58 2021
Time for phase 3 = 13106.198 seconds. CPU (60.360%) Sun May 16 12:29:59 2021

Starting phase 4/4: Write Checkpoint tables into "E:\\Chia\\Plots\\Temp\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp" ... Sun May 16 12:29:59 2021
    Starting to write C1 and C3 tables
...
    Finished writing C1 and C3 tables
    Writing C2 table
    Finished writing C2 table
    Final table pointers:
    P1: 0x10c
    P2: 0x374851cd2
    P3: 0x6b5fa750b
    P4: 0x9fdca084e
    P5: 0xd55d00a36
    P6: 0x10d99a2b4b
    P7: 0x14ea3a8cda
    C1: 0x190a3aefda
    C2: 0x190a5526c6
    C3: 0x190a552776
Time for phase 4 = 1260.113 seconds. CPU (48.890%) Sun May 16 12:50:59 2021
Approximate working space used (without final file): 269.355 GiB
Final File size: 101.361 GiB
Total time = 35875.386 seconds. CPU (78.000%) Sun May 16 12:50:59 2021
Copied final file from "E:\\Chia\\Plots\\Temp\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp" to "G:\\dst\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp"
Copy time = 950.196 seconds. CPU (8.080%) Sun May 16 13:06:50 2021
Removed temp2 file "E:\\Chia\\Plots\\Temp\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp"? 1
Renamed final file from "G:\\dst\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot.2.tmp" to "G:\\dst\\plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot"
2021-05-16T13:06:50.491  chia.plotting.create_plots       : INFO     Summary:
2021-05-16T13:06:50.491  chia.plotting.create_plots       : INFO     Created a total of 1 new plots
2021-05-16T13:06:50.491  chia.plotting.create_plots       : INFO     plot-k32-2021-05-16-02-53-2a2ae9c134e9ba1f344172c1fc758bbffd6fa573a40945ada01bcb9bc75c54d1.plot

We can see that each minor phase in the same major phase will take approximately the same time, with only one exception, phase 1 table 1 takes only a fraction of time of other tables.

phase name time percent characteristic
1 Forward propagation 15299 43% cpu bound
2 Backpropagating 6209 17% cpu bound (mostly)
3 Compression 13106 37% cpu bound
4 Write Checkpoint tables 1260 4% io bound ( reading from disk )
total 35875 101%

Note:

  1. Apparently copying the final file to destination directory is not counted here
  2. This plot job uses an temp directory on an HDD
  3. the characteristic in the table is a guess
altendky commented 3 years ago

Progress is a hard thing. The only way to get it correct here, even in a steady state situation on a single computer, is to do detailed log timing analysis and use that as the reference. Remember, you have to account for the threads in stage one. I run a single thread on one system and ran 4 on another for awhile. Either that is a silly thing to do or it has a relevant effect on this problem.

I can't say I understand how this brings significant value, especially given the associated complexity. What is it that people will do with this information that they can't do similarly with the major:minor phase status? Is it mostly an interest in more fine grained status that updates more often? We could have major:minor:micro phases based on the buckets or whatever is the next layer in each phase.

Don't get me wrong, I have a basic interest in percentage progress feedback as well. I just don't understand how it helps beyond mental satisfaction or how we can make it not a big lie and also not a major investment of time.

hackerzhuli commented 3 years ago

I guess we can only make the percentage as close to linear relationship with time as we can get. Of course it is imposible. I guess showing a percentage is much easier to understand for people who don't know what phases mean and just want to know the "progess". I think just use a simple estimation is good enough for now.

I added it in my own fork of plotman, so people who use it don't have to learn what phases mean to learn progress of jobs: # plot id k tmp dst wall phase tmp pid stat mem user sys io
0 ad4d9060 32 E:\Chia\Plots\Temp G:\dst 2:47 1:3 13% 125G 15300 RUN 5.1G 1:14 0:02 -
1 65902766 32 E:\Chia\Plots\Temp G:\dst 5:33 1:6 32% 177G 9832 RUN 5.1G 3:30 0:06 -