ansys / pre-commit-hooks

Ansys-developed pre-commit hooks for automating style and formatting
https://pre-commit-hooks.docs.ansys.com/
MIT License
6 stars 1 forks source link

Improving `add_license_header` performance #214

Closed germa89 closed 3 months ago

germa89 commented 3 months ago

Just playing.... for the moment.

Current status (no changes)

(separating runs with ;) Files needing liting (missing_headers) is: 1847; 1847 Time-cprofiler: 1.227s;2.061s Hook duration: 2.83s;3.32s

Profiler output

``` Tue Aug 6 19:48:59 2024 profile_results.prof 946534 function calls (881403 primitive calls) in 1.227 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.227 1.227 add_license_headers.py:640(find_files_missing_header) 33 0.000 0.000 1.150 0.035 connection.py:246(recv) 1 0.001 0.001 1.123 1.123 add_license_headers.py:177(list_noncompliant_files) 80/76 0.152 0.002 1.121 0.015 {built-in method posix.read} 70/66 0.000 0.000 1.117 0.017 connection.py:390(_recv) 1 0.000 0.000 1.114 1.114 lint.py:339(run) 35/33 0.000 0.000 1.101 0.033 connection.py:429(_recv_bytes) 1 0.001 0.001 1.027 1.027 report.py:274(generate) 1 0.000 0.000 1.024 1.024 pool.py:738(__exit__) 1 0.000 0.000 1.014 1.014 pool.py:654(terminate) 15 0.000 0.000 0.985 0.066 util.py:208(__call__) 1 0.000 0.000 0.984 0.984 pool.py:680(_terminate_pool) 1 0.000 0.000 0.970 0.970 pool.py:671(_help_stuff_finish) 1 0.000 0.000 0.970 0.970 {method 'acquire' of '_multiprocessing.SemLock' objects} 3/1 0.000 0.000 0.970 0.970 threading.py:1016(_bootstrap) 3/1 0.000 0.000 0.970 0.970 threading.py:1056(_bootstrap_inner) 3/1 0.000 0.000 0.970 0.970 threading.py:999(run) 1 0.000 0.000 0.970 0.970 pool.py:573(_handle_results) 1 0.000 0.000 0.970 0.970 pool.py:527(_handle_tasks) 1 0.000 0.000 0.969 0.969 pool.py:362(map) 1 0.000 0.000 0.622 0.622 pool.py:767(get) 23 0.000 0.000 0.577 0.025 pool.py:333(_maintain_pool) 231 0.000 0.000 0.575 0.002 popen_fork.py:24(poll) 23 0.000 0.000 0.574 0.025 pool.py:289(_join_exited_workers) 192 0.000 0.000 0.572 0.003 process.py:224(exitcode) 1 0.000 0.000 0.347 0.347 pool.py:471(_map_async) 1902 0.006 0.000 0.347 0.000 project.py:160(all_files) 40 0.000 0.000 0.322 0.008 connection.py:202(send) 45 0.000 0.000 0.315 0.007 connection.py:406(_send_bytes) 77 0.000 0.000 0.302 0.004 connection.py:381(_send) 109 0.302 0.003 0.302 0.003 {built-in method posix.write} 291 0.003 0.000 0.252 0.001 :282(walk) 293 0.198 0.001 0.198 0.001 {built-in method posix.scandir} 53 0.000 0.000 0.143 0.003 selectors.py:402(select) 53 0.142 0.003 0.142 0.003 {method 'poll' of 'select.poll' objects} 23 0.000 0.000 0.130 0.006 pool.py:500(_wait_for_updates) 49 0.000 0.000 0.105 0.002 connection.py:1122(wait) 1 0.000 0.000 0.086 0.086 lint.py:235(format_json) 2275 0.006 0.000 0.064 0.000 project.py:356(_is_path_ignored) 6100 0.004 0.000 0.056 0.000 pathlib.py:407(_load_parts) 2783/2689 0.050 0.000 0.052 0.000 {built-in method builtins.next} 1 0.001 0.001 0.052 0.052 report.py:124(to_dict_lint) 2 0.000 0.000 0.051 0.025 cmd.py:986() 2 0.000 0.000 0.051 0.025 cmd.py:1522(_call_process) 2 0.000 0.000 0.050 0.025 cmd.py:1079(execute) 1 0.000 0.000 0.048 0.048 add_license_headers.py:526(update_license_file) 14147 0.006 0.000 0.048 0.000 pathlib.py:437(__str__) 35 0.007 0.000 0.044 0.001 {built-in method _pickle.loads} 8668 0.004 0.000 0.043 0.000 pathlib.py:1157(__init__) 2 0.000 0.000 0.043 0.021 subprocess.py:1165(communicate) 2 0.000 0.000 0.043 0.021 subprocess.py:2062(_communicate) 10569 0.023 0.000 0.041 0.000 pathlib.py:358(__init__) 16782 0.003 0.000 0.039 0.000 pathlib.py:551(drive) 6100 0.022 0.000 0.036 0.000 pathlib.py:387(_parse_path) 1 0.000 0.000 0.035 0.035 __init__.py:183(dumps) 1 0.003 0.003 0.035 0.035 encoder.py:183(encode) 42181/41525 0.005 0.000 0.031 0.000 encoder.py:414(_iterencode) 37 0.028 0.001 0.029 0.001 {built-in method _io.open} 3553 0.001 0.000 0.027 0.000 pathlib.py:524(__hash__) 90227/41525 0.013 0.000 0.027 0.000 encoder.py:334(_iterencode_dict) 6995 0.020 0.000 0.027 0.000 {built-in method posix.stat} 6769 0.001 0.000 0.027 0.000 pathlib.py:835(stat) 1903 0.001 0.000 0.025 0.000 pathlib.py:484(_str_normcase) 45 0.002 0.000 0.024 0.001 reduction.py:48(dumps) 14863 0.003 0.000 0.024 0.000 pathlib.py:569(_tail) 2289 0.001 0.000 0.023 0.000 pathlib.py:583(name) 61 0.020 0.000 0.023 0.000 {method 'dump' of '_pickle.Pickler' objects} 1901 0.002 0.000 0.023 0.000 report.py:532(to_dict_lint) 4 0.000 0.000 0.023 0.006 report.py:394(files_without_copyright) 26 0.000 0.000 0.023 0.001 connection.py:253(poll) 25 0.000 0.000 0.023 0.001 queues.py:374(empty) 56372/41433 0.007 0.000 0.021 0.000 encoder.py:278(_iterencode_list) 1 0.000 0.000 0.021 0.021 pool.py:305(_repopulate_pool) 1 0.000 0.000 0.021 0.021 pool.py:314(_repopulate_pool_static) 3/1 0.000 0.000 0.021 0.021 add_license_headers.py:337(recursive_file_check) 8 0.000 0.000 0.021 0.003 process.py:110(start) 16/6 0.000 0.000 0.020 0.003 :1349(_find_and_load) 16/6 0.000 0.000 0.020 0.003 :1304(_find_and_load_unlocked) 8 0.000 0.000 0.020 0.003 context.py:286(_Popen) 15/6 0.000 0.000 0.020 0.003 :911(_load_unlocked) 11/6 0.000 0.000 0.020 0.003 :989(exec_module) 1901 0.001 0.000 0.019 0.000 pathlib.py:450(as_posix) 8 0.000 0.000 0.017 0.002 popen_spawn_posix.py:30(__init__) 8 0.000 0.000 0.017 0.002 popen_fork.py:15(__init__) 8 0.001 0.000 0.017 0.002 popen_spawn_posix.py:38(_launch) 4723 0.009 0.000 0.017 0.000 :71(join) 2275 0.001 0.000 0.016 0.000 pathlib.py:886(is_file) 14/8 0.000 0.000 0.013 0.002 {built-in method builtins.exec} 37/12 0.000 0.000 0.013 0.001 :480(_call_with_frames_removed) 4557 0.003 0.000 0.013 0.000 pathlib.py:380(with_segments) 167362 0.012 0.000 0.012 0.000 {built-in method builtins.isinstance} 11 0.000 0.000 0.011 0.001 :1062(get_code) 12 0.011 0.001 0.011 0.001 {built-in method posix.getcwd} 8 0.000 0.000 0.011 0.001 process.py:128(terminate) 8 0.000 0.000 0.011 0.001 popen_fork.py:56(terminate) 8 0.000 0.000 0.011 0.001 popen_fork.py:46(_send_signal) 8 0.011 0.001 0.011 0.001 {built-in method posix.kill} 2277 0.002 0.000 0.010 0.000 pathlib.py:731(parent) 11 0.000 0.000 0.010 0.001 :1183(get_data) 3 0.000 0.000 0.010 0.003 subprocess.py:807(__init__) 11 0.010 0.001 0.010 0.001 {built-in method _io.open_code} 2 0.000 0.000 0.010 0.005 _annotate.py:502(run) 3 0.000 0.000 0.009 0.003 add_license_headers.py:420(check_same_content) 3 0.000 0.000 0.009 0.003 filecmp.py:30(cmp) 26 0.000 0.000 0.009 0.000 connection.py:439(_poll) 1 0.000 0.000 0.009 0.009 filecmp.py:75(_do_cmp) 2278 0.001 0.000 0.009 0.000 pathlib.py:719(__truediv__) 8 0.001 0.000 0.009 0.001 spawn.py:160(get_preparation_data) 3 0.001 0.000 0.009 0.003 subprocess.py:1791(_execute_child) 2278 0.001 0.000 0.008 0.000 pathlib.py:711(joinpath) 62254 0.008 0.000 0.008 0.000 {built-in method sys.intern} 3 0.000 0.000 0.008 0.003 context.py:110(SimpleQueue) 1 0.000 0.000 0.008 0.008 pool.py:345(_setup_queues) 1 0.000 0.000 0.008 0.008 pool.py:1() 2242 0.001 0.000 0.008 0.000 pathlib.py:909(is_symlink) 2277 0.001 0.000 0.008 0.000 pathlib.py:420(_from_parsed_parts) 1 0.000 0.000 0.007 0.007 base.py:172(__init__) 3 0.000 0.000 0.007 0.002 queues.py:361(__init__) 15 0.000 0.000 0.007 0.000 :806(module_from_spec) 6 0.000 0.000 0.007 0.001 context.py:65(Lock) 12 0.007 0.001 0.007 0.001 {built-in method _posixsubprocess.fork_exec} 6784 0.002 0.000 0.007 0.000 pathlib.py:447(__fspath__) 3 0.000 0.000 0.007 0.002 :1287(create_module) 3 0.007 0.002 0.007 0.002 {built-in method _imp.create_dynamic} 2242 0.001 0.000 0.007 0.000 pathlib.py:842(lstat) 60960 0.006 0.000 0.006 0.000 {method 'append' of 'list' objects} # Shorted ```

Skipping some dirs

Let's expand reuse._IGNORE_DIR_PATTERNS directory to avoid some dirs like .venv.

Files needing liting (missing_headers) is: 61 Time-cprofiler: 0.871 Hook duration: 2.32s

This could be expanded to omit everyfile that is not .py, .rst, etc. Of course this is not part of the API, but.... 🤷🏻‍♂️

Profiler output

``` Tue Aug 6 19:52:39 2024 profile_results.prof 68070 function calls (64836 primitive calls) in 0.871 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.871 0.871 add_license_headers.py:640(find_files_missing_header) 1 0.000 0.000 0.789 0.789 add_license_headers.py:177(list_noncompliant_files) 1 0.000 0.000 0.783 0.783 lint.py:339(run) 1 0.000 0.000 0.779 0.779 report.py:274(generate) 1 0.000 0.000 0.779 0.779 pool.py:738(__exit__) 1 0.000 0.000 0.766 0.766 pool.py:654(terminate) 15 0.000 0.000 0.732 0.049 util.py:208(__call__) 1 0.000 0.000 0.732 0.732 pool.py:680(_terminate_pool) 24 0.000 0.000 0.725 0.030 connection.py:246(recv) 60/58 0.037 0.001 0.718 0.012 {built-in method posix.read} 25/24 0.000 0.000 0.716 0.030 connection.py:429(_recv_bytes) 50/48 0.000 0.000 0.716 0.015 connection.py:390(_recv) 1 0.000 0.000 0.683 0.683 pool.py:671(_help_stuff_finish) 1 0.001 0.001 0.683 0.683 {method 'acquire' of '_multiprocessing.SemLock' objects} 3/1 0.000 0.000 0.682 0.682 threading.py:1016(_bootstrap) 3/1 0.000 0.000 0.682 0.682 threading.py:1056(_bootstrap_inner) 3/1 0.000 0.000 0.682 0.682 threading.py:999(run) 1 0.000 0.000 0.682 0.682 pool.py:573(_handle_results) 1 0.000 0.000 0.681 0.681 pool.py:527(_handle_tasks) 1 0.000 0.000 0.645 0.645 pool.py:506(_handle_workers) 10 0.000 0.000 0.632 0.063 pool.py:500(_wait_for_updates) 12 0.000 0.000 0.632 0.053 connection.py:253(poll) 11 0.000 0.000 0.631 0.057 queues.py:374(empty) 12 0.000 0.000 0.630 0.053 connection.py:439(_poll) 31 0.000 0.000 0.598 0.019 connection.py:202(send) 36 0.000 0.000 0.594 0.017 connection.py:406(_send_bytes) 91 0.594 0.007 0.594 0.007 {built-in method posix.write} 59 0.000 0.000 0.594 0.010 connection.py:381(_send) 7 0.000 0.000 0.047 0.007 process.py:128(terminate) 7 0.000 0.000 0.047 0.007 popen_fork.py:56(terminate) 7 0.000 0.000 0.046 0.007 popen_fork.py:46(_send_signal) 7 0.046 0.007 0.046 0.007 {built-in method posix.kill} 1 0.000 0.000 0.036 0.036 pool.py:471(_map_async) 69 0.001 0.000 0.036 0.001 project.py:160(all_files) 2 0.000 0.000 0.035 0.017 cmd.py:986() 2 0.000 0.000 0.035 0.017 cmd.py:1522(_call_process) 2 0.000 0.000 0.035 0.017 cmd.py:1079(execute) 1 0.000 0.000 0.030 0.030 add_license_headers.py:526(update_license_file) 25 0.000 0.000 0.029 0.001 :282(walk) 26 0.000 0.000 0.028 0.001 selectors.py:402(select) 26 0.028 0.001 0.028 0.001 {method 'poll' of 'select.poll' objects} 2 0.000 0.000 0.028 0.014 subprocess.py:1165(communicate) 2 0.000 0.000 0.028 0.014 subprocess.py:2062(_communicate) 37 0.026 0.001 0.027 0.001 {built-in method _io.open} 27 0.025 0.001 0.025 0.001 {built-in method posix.scandir} 16/6 0.000 0.000 0.025 0.004 :1349(_find_and_load) 16/6 0.000 0.000 0.025 0.004 :1304(_find_and_load_unlocked) 15/6 0.000 0.000 0.024 0.004 :911(_load_unlocked) 11/6 0.000 0.000 0.024 0.004 :989(exec_module) 1 0.000 0.000 0.022 0.022 pool.py:305(_repopulate_pool) 1 0.000 0.000 0.022 0.022 pool.py:314(_repopulate_pool_static) 8 0.000 0.000 0.022 0.003 process.py:110(start) 8 0.000 0.000 0.021 0.003 context.py:286(_Popen) 3/1 0.000 0.000 0.020 0.020 add_license_headers.py:337(recursive_file_check) 8 0.000 0.000 0.018 0.002 popen_spawn_posix.py:30(__init__) 8 0.000 0.000 0.018 0.002 popen_fork.py:15(__init__) 8 0.001 0.000 0.018 0.002 popen_spawn_posix.py:38(_launch) 14/8 0.000 0.000 0.017 0.002 {built-in method builtins.exec} 37/12 0.000 0.000 0.017 0.001 :480(_call_with_frames_removed) 52 0.012 0.000 0.014 0.000 {method 'dump' of '_pickle.Pickler' objects} 11 0.000 0.000 0.013 0.001 :1062(get_code) 36 0.001 0.000 0.012 0.000 reduction.py:48(dumps) 11 0.000 0.000 0.011 0.001 :1183(get_data) 11 0.011 0.001 0.011 0.001 {built-in method _io.open_code} 3 0.000 0.000 0.011 0.004 context.py:110(SimpleQueue) 1 0.000 0.000 0.011 0.011 pool.py:1() 1 0.000 0.000 0.011 0.011 pool.py:345(_setup_queues) 2 0.000 0.000 0.010 0.005 _annotate.py:502(run) 3 0.000 0.000 0.010 0.003 queues.py:361(__init__) 6 0.000 0.000 0.010 0.002 context.py:65(Lock) 3 0.000 0.000 0.009 0.003 subprocess.py:807(__init__) 3 0.000 0.000 0.009 0.003 shutil.py:230(copyfile) 3 0.000 0.000 0.009 0.003 add_license_headers.py:420(check_same_content) 3 0.000 0.000 0.009 0.003 filecmp.py:30(cmp) 1 0.000 0.000 0.008 0.008 filecmp.py:75(_do_cmp) 6 0.000 0.000 0.008 0.001 synchronize.py:168(__init__) 6 0.000 0.000 0.008 0.001 synchronize.py:50(__init__) 8 0.000 0.000 0.008 0.001 spawn.py:160(get_preparation_data) 15 0.000 0.000 0.008 0.001 :806(module_from_spec) 12 0.008 0.001 0.008 0.001 {built-in method posix.getcwd} 3 0.000 0.000 0.008 0.003 :1287(create_module) 3 0.008 0.003 0.008 0.003 {built-in method _imp.create_dynamic} 3 0.001 0.000 0.007 0.002 subprocess.py:1791(_execute_child) 12 0.007 0.001 0.007 0.001 {built-in method _posixsubprocess.fork_exec} 5 0.000 0.000 0.006 0.001 {built-in method builtins.__import__} 1 0.000 0.000 0.006 0.006 add_license_headers.py:116(link_assets) 9 0.001 0.000 0.005 0.001 util.py:450(spawnv_passfds) 2 0.000 0.000 0.005 0.003 add_license_headers.py:151(mkdirs_and_link) 1 0.000 0.000 0.005 0.005 base.py:172(__init__) 2 0.000 0.000 0.005 0.003 _annotate.py:257(get_template) 2 0.000 0.000 0.005 0.003 _annotate.py:87(find_template) 4 0.000 0.000 0.005 0.001 environment.py:978(get_template) 4 0.000 0.000 0.005 0.001 environment.py:953(_load_template) 4 0.000 0.000 0.005 0.001 loaders.py:107(load) 3/2 0.000 0.000 0.005 0.002 :200(makedirs) 3 0.005 0.002 0.005 0.002 {built-in method posix.mkdir} 28/23 0.000 0.000 0.005 0.000 :1390(_handle_fromlist) 345/251 0.003 0.000 0.005 0.000 {built-in method builtins.next} 103 0.001 0.000 0.004 0.000 project.py:356(_is_path_ignored) 1 0.000 0.000 0.004 0.004 resource_tracker.py:1() 2 0.000 0.000 0.004 0.002 environment.py:728(compile) 2 0.000 0.000 0.004 0.002 _annotate.py:112(add_header_to_file) 524 0.003 0.000 0.003 0.000 {built-in method posix.stat} 1 0.000 0.000 0.003 0.003 connection.py:1() 1 0.000 0.000 0.003 0.003 cmd.py:662(is_cygwin) 1 0.000 0.000 0.003 0.003 util.py:486(is_cygwin_git) 1 0.000 0.000 0.003 0.003 util.py:455(_is_cygwin_git) 16 0.000 0.000 0.003 0.000 reduction.py:58(dump) 1 0.000 0.000 0.003 0.003 queue.py:1() 127 0.000 0.000 0.003 0.000 popen_fork.py:24(poll) 129 0.003 0.000 0.003 0.000 {built-in method posix.waitpid} 1 0.000 0.000 0.003 0.003 lint.py:235(format_json) 11 0.000 0.000 0.003 0.000 process.py:142(join) 11 0.000 0.000 0.003 0.000 popen_fork.py:36(wait) # Shorted ```

By the way, in theory they are skipping analysing every file ignored by the gitignore, but: a) I coudn't find where this is implemented. b) For me, the files in .venv were missing the headers.

Avoiding writing a json file

Also incluying skipping directories changes (incremental).

Using StringIO we should be able to avoid writing a text file.

(separating runs with ;) Files needing liting (missing_headers) is: 61 Time-cprofiler: 0.955s,0.882s Hook duration: 2.17s,1.9s

Profiler output

``` Tue Aug 6 20:22:34 2024 profile_results.prof 68182 function calls (64944 primitive calls) in 0.955 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.955 0.955 add_license_headers.py:636(find_files_missing_header) 1 0.000 0.000 0.865 0.865 add_license_headers.py:177(list_noncompliant_files) 1 0.000 0.000 0.865 0.865 lint.py:339(run) 1 0.000 0.000 0.862 0.862 report.py:274(generate) 1 0.000 0.000 0.862 0.862 pool.py:738(__exit__) 1 0.000 0.000 0.840 0.840 pool.py:654(terminate) 15 0.000 0.000 0.801 0.053 util.py:208(__call__) 1 0.000 0.000 0.800 0.800 pool.py:680(_terminate_pool) 24 0.000 0.000 0.800 0.033 connection.py:246(recv) 63/59 0.045 0.001 0.795 0.013 {built-in method posix.read} 26/24 0.000 0.000 0.793 0.033 connection.py:429(_recv_bytes) 52/48 0.000 0.000 0.792 0.017 connection.py:390(_recv) 1 0.000 0.000 0.760 0.760 pool.py:671(_help_stuff_finish) 1 0.010 0.010 0.760 0.760 {method 'acquire' of '_multiprocessing.SemLock' objects} 3/1 0.000 0.000 0.750 0.750 threading.py:1016(_bootstrap) 3/1 0.000 0.000 0.750 0.750 threading.py:1056(_bootstrap_inner) 3/1 0.000 0.000 0.750 0.750 threading.py:999(run) 1 0.000 0.000 0.750 0.750 pool.py:573(_handle_results) 1 0.000 0.000 0.749 0.749 pool.py:527(_handle_tasks) 1 0.000 0.000 0.749 0.749 pool.py:362(map) 1 0.000 0.000 0.719 0.719 pool.py:767(get) 11 0.000 0.000 0.705 0.064 pool.py:500(_wait_for_updates) 14 0.000 0.000 0.703 0.050 connection.py:253(poll) 13 0.000 0.000 0.703 0.054 queues.py:374(empty) 14 0.000 0.000 0.703 0.050 connection.py:439(_poll) 31 0.000 0.000 0.665 0.021 connection.py:202(send) 36 0.000 0.000 0.660 0.018 connection.py:406(_send_bytes) 91 0.660 0.007 0.660 0.007 {built-in method posix.write} 59 0.000 0.000 0.660 0.011 connection.py:381(_send) 16/6 0.000 0.000 0.047 0.008 :1349(_find_and_load) 16/6 0.000 0.000 0.047 0.008 :1304(_find_and_load_unlocked) 15/6 0.000 0.000 0.047 0.008 :911(_load_unlocked) 11/6 0.000 0.000 0.046 0.008 :989(exec_module) 2 0.000 0.000 0.046 0.023 cmd.py:986() 2 0.000 0.000 0.046 0.023 cmd.py:1522(_call_process) 2 0.000 0.000 0.045 0.023 cmd.py:1079(execute) 2 0.000 0.000 0.040 0.020 subprocess.py:1165(communicate) 2 0.000 0.000 0.040 0.020 subprocess.py:2062(_communicate) 29 0.000 0.000 0.040 0.001 selectors.py:402(select) 29 0.040 0.001 0.040 0.001 {method 'poll' of 'select.poll' objects} 8 0.000 0.000 0.038 0.005 process.py:128(terminate) 8 0.000 0.000 0.038 0.005 popen_fork.py:56(terminate) 8 0.000 0.000 0.037 0.005 popen_fork.py:46(_send_signal) 8 0.037 0.005 0.037 0.005 {built-in method posix.kill} 11 0.000 0.000 0.032 0.003 :1062(get_code) 11 0.000 0.000 0.031 0.003 :1183(get_data) 11 0.030 0.003 0.030 0.003 {built-in method _io.open_code} 1 0.000 0.000 0.030 0.030 pool.py:471(_map_async) 69 0.001 0.000 0.030 0.000 project.py:160(all_files) 14/8 0.000 0.000 0.026 0.003 {built-in method builtins.exec} 37/12 0.000 0.000 0.026 0.002 :480(_call_with_frames_removed) 25 0.000 0.000 0.023 0.001 :282(walk) 1 0.000 0.000 0.023 0.023 add_license_headers.py:522(update_license_file) 3/1 0.000 0.000 0.023 0.023 add_license_headers.py:333(recursive_file_check) 35 0.020 0.001 0.020 0.001 {built-in method _io.open} 1 0.000 0.000 0.020 0.020 pool.py:305(_repopulate_pool) 1 0.000 0.000 0.020 0.020 pool.py:314(_repopulate_pool_static) 27 0.020 0.001 0.020 0.001 {built-in method posix.scandir} 3 0.000 0.000 0.019 0.006 context.py:110(SimpleQueue) 1 0.000 0.000 0.019 0.019 pool.py:345(_setup_queues) 8 0.000 0.000 0.019 0.002 process.py:110(start) 8 0.000 0.000 0.019 0.002 context.py:286(_Popen) 1 0.000 0.000 0.017 0.017 pool.py:1() 8 0.000 0.000 0.014 0.002 popen_spawn_posix.py:30(__init__) 8 0.000 0.000 0.014 0.002 popen_fork.py:15(__init__) 8 0.001 0.000 0.013 0.002 popen_spawn_posix.py:38(_launch) 36 0.001 0.000 0.013 0.000 reduction.py:48(dumps) 3 0.000 0.000 0.013 0.004 queues.py:361(__init__) 6 0.000 0.000 0.013 0.002 context.py:65(Lock) 52 0.012 0.000 0.013 0.000 {method 'dump' of '_pickle.Pickler' objects} 15 0.000 0.000 0.013 0.001 :806(module_from_spec) 3 0.000 0.000 0.012 0.004 :1287(create_module) 3 0.012 0.004 0.012 0.004 {built-in method _imp.create_dynamic} 2 0.000 0.000 0.012 0.006 _annotate.py:502(run) 5 0.000 0.000 0.011 0.002 {built-in method builtins.__import__} 6 0.000 0.000 0.010 0.002 synchronize.py:168(__init__) 6 0.000 0.000 0.010 0.002 synchronize.py:50(__init__) 28/23 0.000 0.000 0.008 0.000 :1390(_handle_fromlist) 3 0.000 0.000 0.008 0.003 add_license_headers.py:416(check_same_content) 3 0.000 0.000 0.008 0.003 filecmp.py:30(cmp) 1 0.000 0.000 0.008 0.008 filecmp.py:75(_do_cmp) 2 0.000 0.000 0.007 0.004 _annotate.py:257(get_template) 2 0.000 0.000 0.007 0.004 _annotate.py:87(find_template) 4 0.000 0.000 0.007 0.002 environment.py:978(get_template) 4 0.000 0.000 0.007 0.002 environment.py:953(_load_template) 4 0.000 0.000 0.007 0.002 loaders.py:107(load) 3 0.000 0.000 0.007 0.002 shutil.py:230(copyfile) 3 0.000 0.000 0.007 0.002 subprocess.py:807(__init__) 344/250 0.003 0.000 0.007 0.000 {built-in method builtins.next} 8 0.000 0.000 0.007 0.001 spawn.py:160(get_preparation_data) 12 0.007 0.001 0.007 0.001 {built-in method posix.getcwd} 1 0.000 0.000 0.006 0.006 resource_tracker.py:1() 3 0.001 0.000 0.006 0.002 subprocess.py:1791(_execute_child) 2 0.000 0.000 0.006 0.003 environment.py:728(compile) 1 0.000 0.000 0.006 0.006 connection.py:1() 12 0.005 0.000 0.005 0.000 {built-in method _posixsubprocess.fork_exec} 1 0.000 0.000 0.005 0.005 base.py:172(__init__) 103 0.000 0.000 0.005 0.000 project.py:356(_is_path_ignored) 2 0.000 0.000 0.004 0.002 environment.py:615(_parse) 2 0.000 0.000 0.004 0.002 parser.py:1037(parse) 9 0.000 0.000 0.004 0.000 util.py:450(spawnv_passfds) 8/2 0.000 0.000 0.004 0.002 parser.py:988(subparse) 1 0.000 0.000 0.004 0.004 queue.py:1() 6 0.000 0.000 0.004 0.001 parser.py:167(parse_statement) 78 0.000 0.000 0.003 0.000 lexer.py:380(__next__) 2 0.000 0.000 0.003 0.002 _annotate.py:112(add_header_to_file) 78 0.000 0.000 0.003 0.000 lexer.py:615(wrap) 262 0.000 0.000 0.003 0.000 pathlib.py:407(_load_parts) 135 0.000 0.000 0.003 0.000 popen_fork.py:24(poll) 2 0.000 0.000 0.003 0.002 parser.py:255(parse_if) 26 0.000 0.000 0.003 0.000 lexer.py:403(expect) 138 0.003 0.000 0.003 0.000 {built-in method posix.waitpid} 1 0.000 0.000 0.003 0.003 lint.py:235(format_json) 1 0.000 0.000 0.003 0.003 add_license_headers.py:116(link_assets) 1 0.000 0.000 0.003 0.003 cmd.py:662(is_cygwin) 292 0.000 0.000 0.003 0.000 {method 'decode' of 'bytes' objects} 1 0.000 0.000 0.003 0.003 util.py:486(is_cygwin_git) 1 0.000 0.000 0.003 0.003 util.py:455(_is_cygwin_git) 2 0.000 0.000 0.003 0.001 __init__.py:71(search_function) 11 0.000 0.000 0.003 0.000 process.py:142(join) 11 0.000 0.000 0.003 0.000 popen_fork.py:36(wait) 2 0.000 0.000 0.003 0.001 add_license_headers.py:151(mkdirs_and_link) 2 0.003 0.001 0.003 0.001 {built-in method posix.symlink} 523 0.002 0.000 0.002 0.000 {built-in method posix.stat} 667 0.000 0.000 0.002 0.000 pathlib.py:569(_tail) 117 0.000 0.000 0.002 0.000 pathlib.py:583(name) 1 0.000 0.000 0.002 0.002 popen_spawn_posix.py:1() 447 0.002 0.000 0.002 0.000 :71(join) 22 0.000 0.000 0.002 0.000 fileinput.py:249(__next__) 2 0.000 0.000 0.002 0.001 header.py:225(find_and_replace_header) 1 0.000 0.000 0.002 0.002 _annotate.py:375(add_arguments) 11 0.000 0.000 0.002 0.000 pool.py:333(_maintain_pool) 26 0.001 0.000 0.002 0.000 {built-in method _pickle.loads} 24 0.000 0.000 0.002 0.000 gettext.py:616(gettext) 24 0.000 0.000 0.002 0.000 gettext.py:578(dgettext) 9 0.002 0.000 0.002 0.000 {built-in method posix.open} 24 0.000 0.000 0.002 0.000 gettext.py:519(translation) 1 0.000 0.000 0.002 0.002 report.py:124(to_dict_lint) 2 0.000 0.000 0.001 0.001 fileinput.py:301(_readline) 24 0.000 0.000 0.001 0.000 gettext.py:479(find) 591 0.000 0.000 0.001 0.000 pathlib.py:437(__str__) 392 0.000 0.000 0.001 0.000 pathlib.py:1157(__init__) 2 0.000 0.000 0.001 0.001 environment.py:679(_generate) 2 0.000 0.000 0.001 0.001 compiler.py:101(generate) 196/2 0.000 0.000 0.001 0.001 visitor.py:35(visit) 1 0.000 0.000 0.001 0.001 __init__.py:183(dumps) 2 0.000 0.000 0.001 0.001 compiler.py:829(visit_Template) 1 0.000 0.000 0.001 0.001 encoder.py:183(encode) 302 0.000 0.000 0.001 0.000 pathlib.py:835(stat) 16 0.000 0.000 0.001 0.000 reduction.py:58(dump) 4 0.000 0.000 0.001 0.000 loaders.py:194(get_source) 11 0.000 0.000 0.001 0.000 pool.py:289(_join_exited_workers) 460 0.001 0.000 0.001 0.000 pathlib.py:358(__init__) 1643/1581 0.000 0.000 0.001 0.000 encoder.py:414(_iterencode) 2 0.000 0.000 0.001 0.001 tempfile.py:537(NamedTemporaryFile) 11 0.000 0.000 0.001 0.000 :751(_compile_bytecode) 262 0.001 0.000 0.001 0.000 pathlib.py:387(_parse_path) 2 0.000 0.000 0.001 0.001 check.py:18(is_binary) 2 0.000 0.000 0.001 0.001 add_license_headers.py:615(cleanup) 3/1 0.000 0.000 0.001 0.001 config.py:111(assure_data_present) 744 0.000 0.000 0.001 0.000 pathlib.py:551(drive) 3397/1581 0.001 0.000 0.001 0.000 encoder.py:334(_iterencode_dict) # Shorted ```

Notes

def main():
    """Find files missing license headers and run `REUSE <https://reuse.software/>`_ on them."""
    import cProfile
    import pstats

    profile = cProfile.Profile()
    profile.enable()
    out = find_files_missing_header()
    profile.disable()
    profile.dump_stats('profile_results.prof')

    # Analyzing the results
    with open('profile_report.txt', 'w') as report_file:
        stats = pstats.Stats('profile_results.prof', stream=report_file)
        stats.strip_dirs()
        stats.sort_stats('cumulative')  # Sort by cumulative time
        stats.print_stats()

    return out

if __name__ == "__main__":
    raise SystemExit(main())  # pragma: no cover
germa89 commented 3 months ago

Related PR: https://github.com/ansys/pre-commit-hooks/pull/215

germa89 commented 3 months ago

Pinging @klmcadams @RobPasMue @SMoraisAnsys for feedback

germa89 commented 3 months ago

Using it on PyMAPDL repo:

Default

(separating runs with ;) Time-cprofiler: 129.175 Hook duration: 130.3s

Profiler details

``` Tue Aug 6 22:03:40 2024 profile_results.prof 36888095 function calls (34759341 primitive calls) in 129.175 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 33 0.000 0.000 199.393 6.042 connection.py:246(recv) 494/492 74.285 0.150 198.935 0.404 {built-in method posix.read} 68/66 0.001 0.000 198.931 3.014 connection.py:390(_recv) 82 0.001 0.000 165.286 2.016 pool.py:500(_wait_for_updates) 1 0.010 0.010 129.175 129.175 add_license_headers.py:630(find_files_missing_header) 1 0.114 0.114 129.060 129.060 add_license_headers.py:169(list_noncompliant_files) 1 0.014 0.014 128.850 128.850 lint.py:339(run) 1 0.025 0.025 124.803 124.803 report.py:274(generate) 1 0.000 0.000 124.718 124.718 pool.py:738(__exit__) 1 0.000 0.000 124.700 124.700 pool.py:654(terminate) 15 0.000 0.000 124.658 8.311 util.py:208(__call__) 1 0.000 0.000 124.658 124.658 pool.py:680(_terminate_pool) 34/33 0.000 0.000 124.651 3.777 connection.py:429(_recv_bytes) 1 0.000 0.000 124.650 124.650 pool.py:671(_help_stuff_finish) 1 0.000 0.000 124.650 124.650 {method 'acquire' of '_multiprocessing.SemLock' objects} 3/1 0.000 0.000 124.650 124.650 threading.py:1016(_bootstrap) 3/1 0.000 0.000 124.650 124.650 threading.py:1056(_bootstrap_inner) 3/1 0.000 0.000 124.650 124.650 threading.py:999(run) 1 0.000 0.000 124.650 124.650 pool.py:573(_handle_results) 1 0.000 0.000 124.649 124.649 pool.py:527(_handle_tasks) 82 0.001 0.000 116.806 1.424 pool.py:333(_maintain_pool) 166 0.007 0.000 115.496 0.696 connection.py:1122(wait) 34 0.100 0.003 74.649 2.196 {built-in method _pickle.loads} 354099 0.069 0.000 73.535 0.000 pathlib.py:1164(__new__) 170 0.002 0.000 41.190 0.242 selectors.py:402(select) 170 41.175 0.242 41.175 0.242 {method 'poll' of 'select.poll' objects} 40 0.000 0.000 25.465 0.637 connection.py:202(send) 1 0.010 0.010 6.826 6.826 pool.py:471(_map_async) 81251 0.178 0.000 6.816 0.000 project.py:160(all_files) 1 0.008 0.008 4.014 4.014 lint.py:235(format_json) 91385 0.190 0.000 3.267 0.000 project.py:356(_is_path_ignored) 8812 0.055 0.000 2.600 0.000 :282(walk) 1 0.062 0.062 2.519 2.519 report.py:124(to_dict_lint) 253900 0.160 0.000 2.314 0.000 pathlib.py:407(_load_parts) 593487 0.247 0.000 2.230 0.000 pathlib.py:437(__str__) 273513 0.040 0.000 2.080 0.000 pathlib.py:835(stat) 273718 1.771 0.000 2.043 0.000 {built-in method posix.stat} 354099 0.131 0.000 2.041 0.000 pathlib.py:1157(__init__) 435349 1.330 0.000 1.979 0.000 pathlib.py:358(__init__) 690614 0.167 0.000 1.831 0.000 pathlib.py:551(drive) 91385 0.042 0.000 1.747 0.000 pathlib.py:886(is_file) 8814 1.603 0.000 1.603 0.000 {built-in method posix.scandir} 1 0.008 0.008 1.488 1.488 __init__.py:183(dumps) 1 0.130 0.130 1.480 1.480 encoder.py:183(encode) 157459 0.048 0.000 1.361 0.000 pathlib.py:524(__hash__) 1580783/1571919 0.187 0.000 1.335 0.000 encoder.py:414(_iterencode) 253900 0.770 0.000 1.323 0.000 pathlib.py:387(_parse_path) 4 0.041 0.010 1.322 0.331 report.py:394(files_without_copyright) 81251 0.051 0.000 1.304 0.000 pathlib.py:484(_str_normcase) 3253132/1571919 0.585 0.000 1.151 0.000 encoder.py:334(_iterencode_dict) 664 0.059 0.000 1.029 0.002 process.py:224(exitcode) 81250 0.088 0.000 0.974 0.000 report.py:532(to_dict_lint) 2010072/1571837 0.315 0.000 0.936 0.000 encoder.py:278(_iterencode_list) 100326/100279 0.888 0.000 0.897 0.000 {built-in method builtins.next} 181685 0.530 0.000 0.810 0.000 :71(join) 81250 0.032 0.000 0.791 0.000 pathlib.py:450(as_posix) 609339 0.090 0.000 0.739 0.000 pathlib.py:569(_tail) 91392 0.018 0.000 0.724 0.000 pathlib.py:583(name) 82 0.000 0.000 0.612 0.007 pool.py:289(_join_exited_workers) 6833382 0.450 0.000 0.450 0.000 {built-in method builtins.isinstance} 182776 0.085 0.000 0.413 0.000 pathlib.py:380(with_segments) 2719092 0.336 0.000 0.336 0.000 {built-in method sys.intern} 91387 0.054 0.000 0.330 0.000 pathlib.py:731(parent) 704 0.020 0.000 0.307 0.000 popen_fork.py:24(poll) 91387 0.019 0.000 0.283 0.000 pathlib.py:719(__truediv__) 273524 0.067 0.000 0.271 0.000 pathlib.py:447(__fspath__) 91387 0.033 0.000 0.264 0.000 pathlib.py:711(joinpath) 91064 0.037 0.000 0.260 0.000 pathlib.py:909(is_symlink) 91387 0.035 0.000 0.259 0.000 pathlib.py:420(_from_parsed_parts) 2517627 0.223 0.000 0.223 0.000 {method 'append' of 'list' objects} 91064 0.018 0.000 0.217 0.000 pathlib.py:842(lstat) 345280 0.087 0.000 0.184 0.000 pathlib.py:429(_format_parsed_parts) 2852252/2852251 0.174 0.000 0.174 0.000 {built-in method posix.fspath} 880357 0.157 0.000 0.157 0.000 {method 'match' of 're.Pattern' objects} 45 0.013 0.000 0.151 0.003 connection.py:406(_send_bytes) 172667 0.099 0.000 0.144 0.000 pathlib.py:702(parts) 45 0.007 0.000 0.142 0.003 reduction.py:48(dumps) 707 0.141 0.000 0.141 0.000 {built-in method posix.waitpid} 61 0.060 0.001 0.136 0.002 {method 'dump' of '_pickle.Pickler' objects} 1131694 0.125 0.000 0.125 0.000 {method 'startswith' of 'str' objects} 83 0.000 0.000 0.124 0.001 queues.py:374(empty) 253900 0.086 0.000 0.123 0.000 :138(splitroot) ```

Using my branch

(separating runs with ;) Time-cprofiler: 40.348 seconds Hook duration: 41.65s

Profiler details

``` Tue Aug 6 22:09:21 2024 profile_results.prof 5064605 function calls (4704160 primitive calls) in 40.348 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 34 0.000 0.000 85.648 2.519 pool.py:500(_wait_for_updates) 33 0.000 0.000 75.537 2.289 connection.py:246(recv) 34 0.000 0.000 56.673 1.667 pool.py:333(_maintain_pool) 116/112 12.439 0.107 52.031 0.465 {built-in method posix.read} 70/66 0.000 0.000 52.028 0.788 connection.py:390(_recv) 35/33 0.000 0.000 42.610 1.291 connection.py:429(_recv_bytes) 1 0.001 0.001 40.348 40.348 add_license_headers.py:630(find_files_missing_header) 1 0.011 0.011 40.232 40.232 add_license_headers.py:176(list_noncompliant_files) 1 0.002 0.002 40.211 40.211 lint.py:339(run) 1 0.005 0.005 39.723 39.723 report.py:274(generate) 1 0.000 0.000 39.698 39.698 pool.py:738(__exit__) 1 0.000 0.000 39.668 39.668 pool.py:654(terminate) 15 0.000 0.000 39.602 2.640 util.py:208(__call__) 1 0.000 0.000 39.602 39.602 pool.py:680(_terminate_pool) 1 0.000 0.000 39.593 39.593 pool.py:671(_help_stuff_finish) 1 0.000 0.000 39.593 39.593 {method 'acquire' of '_multiprocessing.SemLock' objects} 3/1 0.000 0.000 39.593 39.593 threading.py:1016(_bootstrap) 3/1 0.000 0.000 39.593 39.593 threading.py:1056(_bootstrap_inner) 3/1 0.000 0.000 39.593 39.593 threading.py:999(run) 1 0.000 0.000 39.593 39.593 pool.py:573(_handle_results) 1 0.000 0.000 39.592 39.592 pool.py:527(_handle_tasks) 35 0.015 0.000 35.559 1.016 {built-in method _pickle.loads} 71 0.008 0.000 35.496 0.500 connection.py:1122(wait) 75 0.002 0.000 26.058 0.347 selectors.py:402(select) 75 26.051 0.347 26.051 0.347 {method 'poll' of 'select.poll' objects} 48788 0.022 0.000 5.933 0.000 pathlib.py:1157(__init__) 1 0.002 0.002 0.772 0.772 pool.py:471(_map_async) 11827 0.026 0.000 0.770 0.000 project.py:160(all_files) 1 0.001 0.001 0.485 0.485 lint.py:235(format_json) 455 0.007 0.000 0.330 0.001 :282(walk) 12332 0.028 0.000 0.299 0.000 project.py:356(_is_path_ignored) 1 0.007 0.007 0.280 0.280 report.py:124(to_dict_lint) 35999 0.023 0.000 0.271 0.000 pathlib.py:407(_load_parts) 83388 0.034 0.000 0.261 0.000 pathlib.py:437(__str__) 60614 0.145 0.000 0.259 0.000 pathlib.py:358(__init__) 457 0.222 0.000 0.222 0.000 {built-in method posix.scandir} 96706 0.026 0.000 0.207 0.000 pathlib.py:551(drive) 1 0.001 0.001 0.204 0.204 __init__.py:183(dumps) 1 0.018 0.018 0.203 0.203 encoder.py:183(encode) 250993/243819 0.028 0.000 0.183 0.000 encoder.py:414(_iterencode) 36 0.000 0.000 0.181 0.005 queues.py:374(empty) 35999 0.093 0.000 0.163 0.000 pathlib.py:387(_parse_path) 518467/243819 0.078 0.000 0.157 0.000 encoder.py:334(_iterencode_dict) 22733 0.007 0.000 0.154 0.000 pathlib.py:524(__hash__) 11827 0.007 0.000 0.146 0.000 pathlib.py:484(_str_normcase) 4 0.005 0.001 0.146 0.036 report.py:394(files_without_copyright) 40 0.000 0.000 0.141 0.004 connection.py:202(send) 36988 0.006 0.000 0.132 0.000 pathlib.py:835(stat) 37193 0.087 0.000 0.128 0.000 {built-in method posix.stat} 321919/243743 0.044 0.000 0.125 0.000 encoder.py:278(_iterencode_list) 11826 0.011 0.000 0.120 0.000 report.py:532(to_dict_lint) 84855 0.014 0.000 0.103 0.000 pathlib.py:569(_tail) 12914/12867 0.097 0.000 0.102 0.000 {built-in method builtins.next} 12339 0.003 0.000 0.100 0.000 pathlib.py:583(name) 11826 0.004 0.000 0.095 0.000 pathlib.py:450(as_posix) 45 0.009 0.000 0.091 0.002 connection.py:406(_send_bytes) 37 0.000 0.000 0.087 0.002 connection.py:253(poll) 12332 0.006 0.000 0.082 0.000 pathlib.py:886(is_file) 24849 0.047 0.000 0.082 0.000 :71(join) 280 0.003 0.000 0.076 0.000 process.py:224(exitcode) 893994 0.071 0.000 0.071 0.000 {built-in method builtins.isinstance} 16/6 0.000 0.000 0.068 0.011 :1349(_find_and_load) 16/6 0.000 0.000 0.068 0.011 :1304(_find_and_load_unlocked) 15/6 0.000 0.000 0.067 0.011 :911(_load_unlocked) 11/6 0.000 0.000 0.067 0.011 :989(exec_module) 24670 0.013 0.000 0.061 0.000 pathlib.py:380(with_segments) 319 0.003 0.000 0.054 0.000 popen_fork.py:24(poll) 12334 0.008 0.000 0.049 0.000 pathlib.py:731(parent) 45 0.003 0.000 0.048 0.001 reduction.py:48(dumps) ```

RobPasMue commented 3 months ago

It is unclear to me what you are checking with each run, the conditions etc.... especially in the issue description (not the second comment)

What does default mean? You are running the released version of the hook or are you using the main branch installed locally? Are you running it on PyMAPDL or some other repo? Incremental changes might not be the best idea to measure impact on performance. I would keep them separated.

RobPasMue commented 3 months ago

Also, what is "your branch"? I am guessing https://github.com/ansys/pre-commit-hooks/pull/215 but you should have added the link next to "my branch" to make it clear 😄