Open AlexShypula opened 2 months ago
In the evaluation corresponding to the above figure, I set the number of workers to 2. I am confused, even though I increased the number of workers from 40 to 100, the entire gem5 execution time did not decrease. I found from the log that it seems that each test was executed the same number of times as the number of workers minus 2, which may be the reason why increasing the number of workers did not improve efficiency. And I am confused as to why the number of CPUs used should be 2 less than the number of workers.
In the evaluation corresponding to the above figure, I set the number of workers to 2. I am confused, even though I increased the number of workers from 40 to 100, the entire gem5 execution time did not decrease. I found from the log that it seems that each test was executed the same number of times as the number of workers minus 2, which may be the reason why increasing the number of workers did not improve efficiency. And I am confused as to why the number of CPUs used should be 2 less than the number of workers.
Gem5 is a CPU-bound task, not I/O bound so increasing number of workers above the number of physical or logical CPUs on your machine will likely not improve performance. I'm not sure if that's what's going on. But you may want to check. If you run htop
and the server is at 100% utilization with 40 workers. The use_logical_cpus
argument will also manually set the upper limit of cpus to the number of logical cpus on the server minus 2. Here is the logic for that. The initial design choice was to prevent setting number of workers to a high number like 200+ which would slow down gem5
execution and could substantially increase the number of timeouts and the experimental results.
One issue I didn't realize is that these arguments are hard-coded, they should be passed in via the config, and the script should be modified to reflect that.
Also: taking a close look at this log, it looks like the temporary directory id's are different, so it seems its not from the same binary. When evaluating, there are likely many generations for each src program, in our paper, usually this was 8. So for each src, input.i.txt
will be executed 8 times for each src program for all i
corresponding to the number of input test cases. The fact the two programs/generations seem completely in sync may be that the 2 programs here have nearly identical execution characteristics.
The reason why we also do minus 2 was to allow some extra workers on the server for other tasks, like the parent process, or other potential processes on the server itself e.g. like using VSCode.
In the evaluation corresponding to the above figure, I set the number of workers to 2. I am confused, even though I increased the number of workers from 40 to 100, the entire gem5 execution time did not decrease. I found from the log that it seems that each test was executed the same number of times as the number of workers minus 2, which may be the reason why increasing the number of workers did not improve efficiency. And I am confused as to why the number of CPUs used should be 2 less than the number of workers.
Gem5 is a CPU-bound task, not I/O bound so increasing number of workers above the number of physical or logical CPUs on your machine will likely not improve performance. I'm not sure if that's what's going on. But you may want to check. If you run
htop
and the server is at 100% utilization with 40 workers. Theuse_logical_cpus
argument will also manually set the upper limit of cpus to the number of logical cpus on the server minus 2. Here is the logic for that. The initial design choice was to prevent setting number of workers to a high number like 200+ which would slow downgem5
execution and could substantially increase the number of timeouts and the experimental results.One issue I didn't realize is that these arguments are hard-coded, they should be passed in via the config, and the script should be modified to reflect that.
Also: taking a close look at this log, it looks like the temporary directory id's are different, so it seems its not from the same binary. When evaluating, there are likely many generations for each src program, in our paper, usually this was 8. So for each src,
input.i.txt
will be executed 8 times for each src program for alli
corresponding to the number of input test cases. The fact the two programs/generations seem completely in sync may be that the 2 programs here have nearly identical execution characteristics.The reason why we also do minus 2 was to allow some extra workers on the server for other tasks, like the parent process, or other potential processes on the server itself e.g. like using VSCode.
Thanks for your reply. I found that the reason why there are fewer gem5.opt processes in the test is that there are fewer input codes and the number of correct codes that can be verified is insufficient. When I set the num_problems_to_evaluate parameter in the yaml file to -1, the number of gem5.opt processes matches the number of CPUs mentioned above.
Given the interest in trying to begin new experiments, it would be helpful to have a faster version of PIE for evaluation. Running all test cases for all programs in the test set with a 120 second per-test case timeout can take many days if not longer to finish.
An ideal fix would be something like the following