dbt-labs / hubcap

This app adds modules to the hubsite at hub.getdbt.com
13 stars 100 forks source link

Parallelize fetching version tags from GitHub #314

Open dbeatty10 opened 6 months ago

dbeatty10 commented 6 months ago

Currently, the hubcap.py script takes 15+ minutes to execute end-to-end.

The slowest part of executing the hubcap.py script is cloning each of the GitHub repos so that we can determine if there are any new tags or not.

This becomes a bottleneck that is constrained by the network, so we spend a lot of time waiting as we serially download each repo.

Is there any way we could parallelize this step so the script doesn't take so long to execute?

Most of the steps that can cause failures happen after this bottleneck is cleared, so during troubleshooting, we often have to wait 15+ minutes to determine if a fix worked or didn't.