Yutaka-Sawada / MultiPar

Parchive tool
948 stars 42 forks source link

Best HARDWARE Options to make a par files done as fast as possible , CPU , RAM , GPU ? MOBO ? NVMe ? #132

Closed DEVUVO closed 1 month ago

DEVUVO commented 1 month ago

Hi , need more info how to make par files as fast as possible

Best HARDWARE Options to make a par files done as fast as possible , CPU , RAM , GPU ? MOBO ? NVMe ?

Thank you , anyone is welcome to share a good info.

Yutaka-Sawada commented 1 month ago

Best HARDWARE Options to make a par files done as fast as possible , CPU , RAM , GPU ? MOBO ? NVMe ?

I made those options to limit usage of hardware resource. Normally default settings (auto or checked) would be fast for most users. So, it's difficult to become faster by changing HARDWARE Options on MultiPar.

When you want faster operation, you would better use faster tools instead of MultiPar. There are fast PAR2 tools by Anime Tosho. They are High performance PAR2 create client for NodeJS or speed focused par2cmdline fork. ParPar will be faster than MultiPar in most cases.

DEVUVO commented 1 month ago

is it possible tomake one version to use most of resources of the server/PC etc as if only use exe version for example use it with ngpost or just the nomal setup for this version will use as much power from ram cpu gpu etc i can see options in the gui version but not sure how the SSE3 CLMUL and JIT SSE2 , AVX2 Works .

I wish you can set all default option to max as 87 % as in the gui version . With auto detection if ssd or fast ssd as in NVME

I use multipar I want to keep using multipar , thank you.

DEVUVO commented 1 month ago

Another question does MultiPar uses cpu total power or per single core so will 16 cores 32 Threads with 3.2 GHZ be more strong or 8 cores with 16 Threads with 5 GHZ ?

I'm building a new rig that's why I cant find this info easy anywhere else.

Yutaka-Sawada commented 1 month ago

is it possible tomake one version to use most of resources of the server/PC

I made that the default setting would be the fastest on most recent PCs. GPU acceleration is disabled by default, because it's slow on my PC. Only when you have very fast Graphics board, enabling GPU acceleration may become faster. Normally you wont get better performance by change other settings.

On some specific (very old or emurator) PC environment, changing a setting may become faster. When a multi-core CPU has no (or little) shared cache memory, less memory usage would be faster. Using small memory area seems to improve CPU's cache hit rate. This odd behavior may be ignorable for recent CPU.

When someone uses external SSD drive, MultiPar cannot recognize the drive as SSD. In this case, change setting from Auto to SSD manually is faster. When a drive is very slow like USB memory, there may be no difference. The recognized access mode is indicated on MultiPar's log.

SIMD is slow in early model of CPU age sometimes. For example, when AVX2 command is known to be slow on an old CPU, disable AVX2 may be faster. Because SIMD speed depends on CPU model, I put some check-boxes for manual selection. Auto detection is good for recent CPUs.

Another question does MultiPar uses cpu total power or per single core so will 16 cores 32 Threads with 3.2 GHZ be more strong or 8 cores with 16 Threads with 5 GHZ ?

PAR2 calculation speed depends on data transfur speed instead of CPU's calculation power mostly. As an example, image many packages and an elevator. Package is data. Elevator is data transfur. Laborer is CPU.

For example, some laborers bring many packages over high building by using an elevator. There are 100 packages. Hiring 4 laborers is double faster than 2 laborers. When 2 laborers bring 100 packages, the elevator up and down 50 times. When 4 laborers bring 100 packages, the elevator up and down 25 times. But, more laborers may not improve speed so much, because the elevator has weight limit.

When the elevator can lift max 5 laborers with 5 packages, using 8 laborers isn't double faster than 4 laborers. When 8 laborers bring 100 packages, the elevator up and down 20 times with 5 laborers a time. While 5 laborers are moving with the elevator, other 3 laborers just wait their time.

In this case, elevator is the bottle-neck of total speed. Hiring more laborers is worthless. Improving walking speed of each laborer is small effect. Improving elevator's speed or elevator's lifting power is important.

Now, increasing number of cores (or threads) on a CPU is worthless, when there are enough cores already. Memory speed would be the most important, when CPU is enough fast. Large CPU's L3 cache may hide slowness of memory speed somehow.

Then, I return to your question. (16 cores 32 Threads with 3.2 GHZ or 8 cores with 16 Threads with 5 GHZ) I don't know which CPU is faster. Even when a CPU itself is faster than others, there is another bottle-neck (memory speed). Using the fastest DRAM for the mathor board would be good. Over heat is a problem, too. Recent CPU's clock may down by heat. When your task requires long time, CPU may not run in full speed. Normally, high-end expensive CPU has more L3 cache memory. CPU with large L3 cache may be good. But, I don't have any evidence.

DEVUVO commented 1 month ago

So the CPU most important is the L3 cache memory not just per single core speed ? so 12 core 3.7 with 128 MB L3 cache is faster than 4.2 ghz with 32MB L3 cache ?

Yutaka-Sawada commented 1 month ago

So the CPU most important is the L3 cache memory not just per single core speed ? so 12 core 3.7 with 128 MB L3 cache is faster than 4.2 ghz with 32MB L3 cache ?

Yes, I think so. When I implemented CPU L3 cache optimization ago, enable/disable optimization made big speed difference. While CPU L3 cache optimization works, larger cache size would improve speed. But, nobody compared the difference between varied CPUs. It's difficult to test cache behavior.

Yutaka-Sawada commented 1 month ago

I made a debug version to show speed of calculating Reed-Solomon Codes. (It doesn't include other time like file creation or hash calculation.) I put the debug package (par2j_debug_2024-07-24.zip) in "MultiPar_sample" folder on OneDrive. If you want to test speed of different setting, you may use it. You may see difference of "number of using threads", "cache optimization", or "file access mode".

Yutaka-Sawada commented 1 month ago

I modified debug version to disable CPU cache optimization manually. Now, you can compare speed of enabling or diabling cache optimization. To disable CPU's L2 cache optimization, set "/lcb0" option. To disable CPU's shared L3 cache optimization, set "/lcm0" option. When CPU cache optimizations are disabled, speed may become very slow. I put the debug package (par2j_debug_2024-07-25.zip) in "MultiPar_sample" folder on OneDrive. I included the latest soruce code in the package for someone interested in the behavior.

DEVUVO commented 1 month ago

set "/lcm0" option ? how to set this ? just add the number manually and so on ? on the text file next to the exe ?

any gui version with more options ? I really find it hard to keep editing txt files and then copy to program folders ,

I will need to speak with you on IRC or somewhere faster than here i got some idea and my time online very limited ,

Yutaka-Sawada commented 1 month ago

set "/lcm0" option ? how to set this ? just add the number manually and so on ? on the text file next to the exe ?

You may set those options at Command prompt, when you call par2j64.exe. The details of usage is written on "Command_par2j.txt".

any gui version with more options ?

I don't want to modify GUI for rare usage. MultiPar GUI may not support some options at Option window. For debug usage, I added new line on MultiPar.ini file. When you write a line par2jOption=/lcm0 on MultiPar.ini file, the option will be concatnated in command-line for par2j automatically. You may test other options like /lc32 for maximum number of using threads. I put the sample package (MultiPar_sample_2024-07-28) in "MultiPar_sample" folder on OneDrive.

I will need to speak with you on IRC or somewhere faster than here i got some idea and my time online very limited

No, it's impossible. I won't understand what you say with my cheap English skill. I use English dictionary and online translator sometimes.

DEVUVO commented 1 month ago

That's fine , Thank you for helping me .

Slava46 commented 1 month ago

So what your tests showed? Insteresting your hardware and difference between different modes.