Closed nikromen closed 6 days ago
I tried to run the original script with a profiler
python -m cProfile misc/copr_new_packages.py --since 2024-10-01
ncalls tottime percall cumtime percall filename:lineno(function)
311/1 0.003 0.000 1632.013 1632.013 {built-in method builtins.exec}
1 0.001 0.001 1632.013 1632.013 copr_new_packages.py:1(<module>)
1 0.000 0.000 1631.228 1631.228 copr_new_packages.py:109(main)
1 0.015 0.015 1610.871 1610.871 copr_new_packages.py:30(pick_project_candidates)
3091 1588.260 0.514 1593.087 0.515 {method 'read' of '_io.BufferedReader' objects}
1948 0.007 0.000 1589.399 0.816 copr_new_packages.py:84(is_in_fedora)
1948 0.023 0.000 1589.392 0.816 subprocess.py:417(check_output)
1948 0.026 0.000 1589.365 0.816 subprocess.py:506(run)
1948 0.020 0.000 1588.328 0.815 subprocess.py:1165(communicate)
42 0.000 0.000 41.750 0.994 helpers.py:71(wrapper)
32 0.002 0.000 41.716 1.304 requests.py:38(send)
32 0.000 0.000 41.695 1.303 requests.py:49(_send_request_repeatedly)
You are right that the is_in_fedora
function is the biggest waste of time. It isn't that slow but it's just called many times.
Instead of doing the proposed async thing and bombarding Koji with thousands of requests in parallel, I'd rather use something like this
fedora-repoquery rawhide "*"
to get the list of all Fedora Rawhide packages at once (it takes 5-10s) and update is_in_fedora
to check the presence of the package in the list.
TIL there's fedora-repoquery, nice.
It isn't that slow but it's just called many times.
not if you stick with getting 1000 packages but that gets you max 10 days old packages at best, so the pool to choose from is really thin. If you want to cover everything from the latest fedora magazine to today, you need to go with more packages than 1000 (e.g. 10k), which takes ages.
fedora-repoquery rawhide "*"
really nice, I didn't know about this feature! This is even better and simpler
1000 before:
time python misc/copr_new_packages.py --since 2024-03-01
.
.
.
________________________________________________________
Executed in 268.26 secs fish external
usr time 57.93 secs 0.00 micros 57.93 secs
sys time 7.46 secs 392.00 micros 7.46 secs
1000 after:
time python misc/copr_new_packages.py --since 2024-03-01
.
.
.
________________________________________________________
Executed in 27.79 secs fish external
usr time 2.33 secs 0.00 micros 2.33 secs
sys time 0.26 secs 396.00 micros 0.26 secs
10k before (hours -> i don't want to do that)
10k after:
time python misc/copr_new_packages.py --since 2024-03-01 --limit 10000
.
.
.
________________________________________________________
Executed in 500.43 secs fish external
usr time 21.80 secs 0.00 micros 21.80 secs
sys time 1.44 secs 433.00 micros 1.44 secs
pls try :pray:
let's discuss this on planning first