Closed alanctprado closed 1 month ago
I think I have an initial version of the evaluation script:
It shows the status, the elapsed real time in seconds and the maximum total memory used. The memory part needs some improvement, but what do you think about the format?
@alanctprado
Also, I'm coming to the conclusion that using a simple text file for configuring the evaluation could be the best option, as setting everything in the command line can make things hard to reproduce (we could try the same configuration after merging improvements, for example) and it would also make the script code less cumbersome.
I think I have an initial version of the evaluation script:
It shows the status, the elapsed real time in seconds and the maximum total memory used. The memory part needs some improvement, but what do you think about the format?
@alanctprado
Looks great to me!
Also, I'm coming to the conclusion that using a simple text file for configuring the evaluation could be the best option, as setting everything in the command line can make things hard to reproduce (we could try the same configuration after merging improvements, for example) and it would also make the script code less cumbersome.
You mean with the flags used, etc? Sounds great actually. Can you work on this?
@alanctprado Just finished changing the evaluation script to this new format. The only thing that still bugs me is how should we set the sleep duration for waiting to launch new jobs (I have implemented this limit parameter)? I guess something like $1/10$ times the TL? Waiting for the whole TL seems sub-optimal.
The script should run all the test cases in a directory and create a .csv file with the results.
The script should limit each test case to a 30' runtime and 8GB memory usage.
The .csv file columns should be the test case, the result (TLE, MLE or Solved), the time it took to run and the memory usage.
The script should receive as parameters the number of processes to be created, the folder in which the test cases are and the path to the output file.
We need this in order to evaluate the improvements on the solver over time.