david-a-wheeler / mmverify.py

Metamath verifier in Python
MIT License
35 stars 10 forks source link

Faster #7

Closed benjub closed 2 years ago

benjub commented 2 years ago

This could subsume PR #6. In addition to it (described in #4), it achieves a 3 to 4 times speedup by verifying compressed proofs directly without converting them to normal format.

Also some bug fixes (including the one mentionned in https://github.com/metamath/metamath-exe/issues/81) and code documentation.

This could be used in https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml: probably no need to split set.mm into main/mathbox thanks to speedup, and also one should probably use normal arguments instead of bash redirections to avoid the bug described in https://github.com/metamath/metamath-exe/issues/81.

benjub commented 2 years ago

@david-a-wheeler : I think it is now ready for review. I documented the code, typed it using mypy --strict, linted it with pylint and autopep8, and increased variable naming consistency and unambiguity. As for speed, here are the results showing a 4x speedup on set.mm (where mmverify0.py is the version currently used in CI):

$ perf stat python3 mmverify.py set.mm

 Performance counter stats for 'python3 mmverify.py set.mm':

         26 775,61 msec task-clock                #    0,999 CPUs utilized          
             1 429      context-switches          #    0,053 K/sec                  
                 0      cpu-migrations            #    0,000 K/sec                  
            35 575      page-faults               #    0,001 M/sec                  
   104 270 949 903      cycles                    #    3,894 GHz                    
   265 952 159 996      instructions              #    2,55  insn per cycle         
    54 148 885 223      branches                  # 2022,321 M/sec                  
       353 622 685      branch-misses             #    0,65% of all branches        

      26,801988983 seconds time elapsed

      26,688784000 seconds user
       0,088002000 seconds sys

$ perf stat python3 mmverify0.py<set.mm

 Performance counter stats for 'python3 mmverify0.py':

        106 868,65 msec task-clock                #    0,999 CPUs utilized          
             5 700      context-switches          #    0,053 K/sec                  
                 0      cpu-migrations            #    0,000 K/sec                  
           136 775      page-faults               #    0,001 M/sec                  
   420 554 488 982      cycles                    #    3,935 GHz                    
 1 150 388 824 329      instructions              #    2,74  insn per cycle         
   232 875 326 916      branches                  # 2179,080 M/sec                  
       956 284 676      branch-misses             #    0,41% of all branches        

     106,953312516 seconds time elapsed

     106,677338000 seconds user
       0,191966000 seconds sys
benjub commented 2 years ago

Thanks @tirix for your reaction. Have you had a look at my changes ? I have reread it and checked it on @david-a-wheeler's test suite, so I'm pretty confident it's correct.

Since it makes mmverify.py four times faster and mmverify.py is currently the slowest CI check in https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml, using my version would have concrete consequences (since verifiers.yml is run more or less daily). I could adapt verifiers.yml there, using my repository for mmverify.py, if @david-a-wheeler does not have time to review it here. What do you think ?

tirix commented 2 years ago

Hi Benoît, I was applauding the initiative and the performance, but I only had a quick look at the changes, no thorough review. What I saw looked good to me, but I have not know the program before, so I'm certainly not the best judge.

I would be in favour of merging this PR on the basis of the test suite (it shall only be a click on the "merge" button for @david-a-wheeler) and using it in the set.mm test suite (if any change is necessary, we shall have the necessary permissions in set.mm).

I know David has been on-and-off recently, and it's not easy for him to follow up everywhere. Generally I think it would probably be best if we could rather move this project to the metamath organization where continuity is ensured.

benjub commented 2 years ago

Thanks @tirix. It looks like @david-a-wheeler has no time for this now. So maybe I can either modify https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml to use my fork of mmverify.py, or maybe better as you suggest, put a copy of this faster mmverify.py in a new folder, say metamath/set.mm/verifiers ? Maybe @digama0 has some advice ?

david-a-wheeler commented 2 years ago

@benjub - How about I give you rights to push changes to this directly? Just give others a chance to review the change first.

david-a-wheeler commented 2 years ago

@benjub @tirix - I agree that in the longer term this should probably move to the "Metamath" group instead. In the short term, I've added @benjub as a collaborator so merges can now happen directly.

Sorry I haven't been very available, it's quite out of my control.

benjub commented 2 years ago

Thanks @david-a-wheeler. As you propose, I'll merge this PR after and if this is reviewed. Maybe by @tirix or @digama0 if/when they have time.

david-a-wheeler commented 2 years ago

Thank you. And again, my apologies for my delays. I've had a lot of personal issues that have prevented me from giving these issues the time I want to give them.

david-a-wheeler commented 2 years ago

Good. I've invited @tirix and @digama0 also as collaborators. Moving this to the Metamath org might be good long-term, but this will at least get things unstuck.

benjub commented 2 years ago

@tirix : I hope this answers your remarks ? Note that mmverify.py (old and new) allows local $v's and $f's (which to me is fine), and also local $c's (which should be changed to global $c's: I'm preparing a PR for that).

benjub commented 2 years ago

Regarding the latest (minor) commit: improve consistency: in floating hypotheses, the order is: first typecode, second variable, both in the database (e.g. $f set x $.) and in the internal data structure used by mmverify.py to store it. This is opposite to the usual writing in type theory (e.g. x:set) but at least this is consistent internally, since the typecode comes first in all Metamath math expressions.

benjub commented 2 years ago

Note: https://github.com/david-a-wheeler/mmverify.py/pull/7/commits/0999f6a5b8cb38239223bb651ae291900018402c introduced a bug. Investigating...

benjub commented 2 years ago

Interesting to know that len(saved_stmts) is less performant.

Well, it seemed a good idea since the length of that array is updated rarely compared to the number of times it is called, and the updates are only of the form "+1". The timings I did did not show significant differences (although they always gave a small advantage to my method). I'm seeing in https://stackoverflow.com/questions/699177/python-do-python-lists-keep-a-count-for-len-or-does-it-count-for-each-call that the length is actually stored (C source code: https://github.com/python/cpython/blob/2.5/Objects/listobject.c#L379). On the other hand, calling len makes a global lookup, which is expensive in Python (compared to using local variables)... All in all, is this optimization worth it ? For the moment, I leave it here.