Some other metric of your lastest results

liuxingpeng520521 commented 6 months ago

I'm wondering to know what is the average number of inferences in your latest test results and how time consuming are they? If it's convenient, I want to know how about them in success and failure theorems respectively. Something like: AVg.Inferences in Total, AVg.Inferences on Failure, AVg.Inferences on Pass, Max.Inferences Allowed, AVg.Time in Total, AVg.Time on Failure, AVg.Time on Pass, Max.Time Allowed.

HazardousPeach commented 6 months ago

Proverbot9001 uses depth and tree width limits, not time or inference limits (at least, in the mode where the best results are), so you're not going to find anything for max time allowed or max inferences allowed. And unfortunately, we haven't been storing individual proof times, so getting average time on failure or pass isn't going to be possible yet. However, the rest should be computable from the published report.

For average time in total, you can take the total time taken in that report, divide by the number of proofs total, and multiply by 64 (for 64 threads). Unfortunately it's not a great comparison point, because the distribution is extremely long-tail (the last 8-10 proofs take over half the time, and should probably just have been cut short), and Proverbot9001 was never optimized for average time.

For the inference statistics, things are a lot better. You can access the json for a bit more detail about each proof. Just take the report url (https://proverbot9001.ucsd.edu/reports/2024-01-26T19d27d24-0700+a13e20f06b2535d9abd20853749b0709b280e054), without the index.html part if that's there, and append -proofs.txt for each test file. So, to look at the json for the CompCert/lib/Parmov.v file, you would go to the url https://proverbot9001.ucsd.edu/reports/2024-01-26T19d27d24-0700+a13e20f06b2535d9abd20853749b0709b280e054/Parmov-proofs.txt. In there is a json object for every proof in the file, including the status of the proof (success, failure, incomplete, or crashed), and how many inferences were needed (labeled "steps_taken"). So, for instance, the fourth line in that file shows that we were able to prove the dests_decomp lemma in lib/Parmov.v only making 3 inferences, because we immediately guessed intros and apply dests_append. You should be able to write a script which given these json files will give you the avg inferences, avg inferences on failure, and avg inferences on pass.

HazardousPeach commented 6 months ago

Oh, and to be clear, you can get the timing numbers for individual proofs too, you would just have to modify the code and reproduce the results separately. Let me know if you want to do that and need guidance.

UCSD-PL / proverbot9001

Some other metric of your lastest results #122