fpgmaas / deptry

Find unused, missing and transitive dependencies in a Python project.
https://deptry.com/
MIT License
913 stars 19 forks source link

Speed up `deptry` by using Rust #580

Closed fpgmaas closed 8 months ago

fpgmaas commented 8 months ago

Is your feature request related to a problem? Please describe.

While deptry is relatively fast, if could probably be sped up by using Rust. For example, a quick test run of deptry on aws-cli:

Time taken to extract dependencies: 0.004 seconds
Time taken to find all python files: 0.026 seconds
Time taken to find all local and stdlib modules: 0.002 seconds
Scanning 216 files...
Time taken to find all imports: 0.190 seconds
Time taken to create the 'ModuleLocations' objects: 0.003 seconds
<omitted output>
Time taken to report: 0.003 seconds
Complete runtime: 0.228 seconds

Running deptry on deptry itself gives:

Time taken to detect dependency management format: 0.001 seconds
Assuming the corresponding module name of package 'types-colorama' is 'types_colorama'. Install the package or configure a package_module_name_map entry to override this behaviour.
Time taken to extract dependencies: 0.017 seconds
Time taken to find all python files: 0.033 seconds
Time taken to find all local and stdlib modules: 0.001 seconds
Scanning 45 files...
Time taken to find all imports: 0.015 seconds
Time taken to create the 'ModuleLocations' objects: 0.002 seconds

Success! No dependency issues found.
Time taken to report: 0.000 seconds
Complete runtime: 0.069 seconds

Here, we see that in a large project like aws-cli, 83% of the time is spent on detecting the imports, i.e. reading the files, parsing the AST, traversing down the AST and then fetching all Import and ImportFrom nodes. In a smaller project like deptry there does not seem to be one specific part of the application that contributes most to the duration of the run. But then again, deptry runs within 7/100'th of a second which already sounds reasonably fast.

Describe the solution you would like

Let's see if we can speed up deptry by using Rust. Given the output of the small test runs above, the main target to replace with Rust seems to be the import extractors in deptry/deptry/imports.

Additional context

I will try to create an initial draft PR in the upcoming few days. I have 0 experience in Rust though, so I'll start with some tutorials and see where I get from there. Any more experienced Rust developers are welcome to contribute ;)

fpgmaas commented 8 months ago

I am facing quite some issues trying to get Poetry & maturin to work together. In the small amount of projects that I could find that combine these two, they usually duplicate a lot of the project's metadata so it is also available in the PEP621 compatible format, see for example pyproject.toml in robyn.

Since we likely will need to make quite some changes in our project to support maturin, my proposal would be that we switch from Poetry to PDM to manage our dependencies.

In that case, pyproject.toml would look like this

fpgmaas commented 8 months ago

I did some development on this over the weekend, and I just published a first draft to test PyPi, see this workflow run. The results are quite promising. For each benchmark, I set up the environment with:

git clone --depth 1 git@github.com:aws/aws-cli.git
cd aws-cli
python -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt requirements-dev.txt

Benchmark deptry 0.12.0

pip install deptry==0.12.0
hyperfine -i 'deptry .' --warmup 1
hyperfine -i 'deptry .' --warmup 1
Benchmark 1: deptry .
  Time (mean ± σ):     298.6 ms ±   5.7 ms    [User: 274.8 ms, System: 21.9 ms]
  Range (min … max):   292.4 ms … 307.5 ms    10 runs

Benchmark deptry + Rust

pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple/ \
    deptry==0.0.13a5
hyperfine -i 'deptry .' --warmup 1
Benchmark 1: deptry .
  Time (mean ± σ):     109.3 ms ±   1.7 ms    [User: 152.6 ms, System: 21.7 ms]
  Range (min … max):   107.2 ms … 115.7 ms    26 runs

I did a manual check on the output to confirm that the reduced runtime is not simply because of deptry existing early on an error; in both cases, deptry scans 219 files and find 151 dependency issues. So, on this particular project a reduction of about 63% in runtime. Did not test on any other projects yet, but a promising start :)