Optimize CLI startup time

juhoinkinen commented 1 year ago

Optimizes CLI startup time by avoiding unnecessary imports for basic commands, mainly to improve the CLI command completion response times in #693.

Changes:

Imports of joblib and numpy are simply moved inside the functions/methods where they are used.
Import of connexion is avoided for all CLI commands but annif run and annif routes by using separate creation scripts for plain-Flask and Connexion-Flask applications. When started via gunicorn "annif:create_app() the Connexion-Flask app is created as previously.

This is a continuation of the optimizations in #514, #544 and #563.

I checked the import time of the pre-release of connexion 3, and it was even slower (~0.35 s) than of the current v2.14.* (0.12 s), so avoiding connexion import is even more valuable in the future.

Improvement compared to current main The improvement seems to be some 0.3 seconds (~0.1 s coming from each of joblib, numpy, and connexion).

Before (main):

$ time annif --version
1.0.0.dev0

real    0m0.610s
user    0m0.948s
sys 0m0.666s

$ time annif --help
Usage: annif [OPTIONS] COMMAND [ARGS]...
...
real    0m0.689s
user    0m1.079s
sys 0m0.626s

$ time annif list-vocabs 
Vocabulary ID       Languages                 Size  Loaded
----------------------------------------------------------
yso                 en,fi,se,sv              38179  True  
ykl                 -                            -  False 
thema-fi            -                            -  False 

real    0m1.214s
user    0m1.500s
sys 0m0.665s

After (PR):

$ time annif --version
1.0.0.dev0

real    0m0.293s
user    0m0.265s
sys 0m0.029s

$ time annif --help
Usage: annif [OPTIONS] COMMAND [ARGS]...
...
real    0m0.305s
user    0m0.266s
sys 0m0.038s

$ time annif list-vocabs 
Vocabulary ID       Languages                 Size  Loaded
----------------------------------------------------------
yso                 en,fi,se,sv              38179  True  
ykl                 -                            -  False 
thema-fi            -                            -  False 

real    0m0.945s
user    0m0.880s
sys 0m0.065s

The CLI command completion response times seemed to follow times of annif --help, so they are improved similarly.

Timing imports for completion responses does not show any easy targets for further optimization. The rdflib is used via multiple modules and in nearly all methods of corpus.subject.SubjectFileSKOS, and I don't know how rdflib import could be done only when needed in there in a nice way.

Tuna output for the timing of list-* completion response time:

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.03 :tada:

Comparison is base (3451bd3) 99.63% compared to head (e6e06b8) 99.66%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #696 +/- ## ========================================== + Coverage 99.63% 99.66% +0.03% ========================================== Files 89 89 Lines 6222 6257 +35 ========================================== + Hits 6199 6236 +37 + Misses 23 21 -2 ``` | [Impacted Files](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/\_\_init\_\_.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvX19pbml0X18ucHk=) | `100.00% <100.00%> (+3.22%)` | :arrow_up: | | [annif/cli.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvY2xpLnB5) | `100.00% <100.00%> (+0.35%)` | :arrow_up: | | [annif/corpus/skos.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvY29ycHVzL3Nrb3MucHk=) | `100.00% <100.00%> (ø)` | | | [annif/corpus/subject.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvY29ycHVzL3N1YmplY3QucHk=) | `100.00% <100.00%> (ø)` | | | [tests/test\_cli.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/696?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-dGVzdHMvdGVzdF9jbGkucHk=) | `100.00% <100.00%> (ø)` | | ... and [3 files with indirect coverage changes](https://app.codecov.io/gh/NatLibFi/Annif/pull/696/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

juhoinkinen commented 1 year ago

The rdflib import could be turned to lazy, but the code is not super clean and the time saving is only ~0.05 s.

However I would still add it. For the command completions even such a tiny improvement in the response time is noticeable I think.

juhoinkinen commented 1 year ago

Seems that upgrading to soon-to-be-released Connexion 3 requires some changes to the way application is created, see https://github.com/NatLibFi/Annif/issues/689#issuecomment-1527508776. Hopefully the changes can be incorporated to this.

juhoinkinen commented 1 year ago

I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])) that doesn't change sys.argv at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?

The line could definitely be run in a test such as

def test_run_separate_process():
    result = os.popen("python annif/cli.py run --help").read()
    assert "Run a local development server." in result

But I don't know how to ensure this this really starts Flask with Connexion, and not just Flask... And codecov does not take this kind of test into account in coverage counting (at least locally run).

osma commented 1 year ago

The os.popen solution doesn't seem very useful, for the reasons you mention.

How about using some other information than just sys.argv[1] in cli.py when deciding whether to choose Connexion or plain Flask?

For testing parallel.py using different spawn methods, I added a constant definition like this: https://github.com/NatLibFi/Annif/blob/3451bd351273e3425a2bea683f8ac2bf3618278b/annif/parallel.py#L7-L10

The unit test that exercises this code overrides the value using monkeypatching: https://github.com/NatLibFi/Annif/blob/3451bd351273e3425a2bea683f8ac2bf3618278b/tests/test_cli.py#L887-L900

Something similar could perhaps be done here - define a constant in cli.py which uses sys.argv and then use it in the if clause:

USE_CONNEXION = len(sys.argv) > 1 and sys.argv[1] == "run"

if USE_CONNEXION:
    create_app = annif.create_app  # Use Flask with Connexion
else:
    # Connexion is not needed for most CLI commands, use plain Flask
    create_app = annif.create_flask_app

Then you can monkeypatch USE_CONNEXION in the unit test to force it to be True in any case.

osma commented 1 year ago

...except now that I think of it, I'm not sure if that will work, because monkeypatching is probably applied too late for this to have an effect. Anyway, something similar could work here.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

juhoinkinen commented 1 year ago

I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])) that doesn't change sys.argv at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?

The line could definitely be run in a test such as
def test_run_separate_process():
    result = os.popen("python annif/cli.py run --help").read()
    assert "Run a local development server." in result
But I don't know how to ensure this this really starts Flask with Connexion, and not just Flask... And codecov does not take this kind of test into account in coverage counting (at least locally run).

Actually using the os.popen solution does get the line 31 covered in codecov (when run in CI). When I locally ran coverage run -m pytest tests/test_cli.py; coverage report -m | grep annif/cli.py that line was reported as missed.

And I made also annif routes command to start the Connexion app, which registers all REST API endpoints. The list of the registered endpoints from the output of routes command is now tested to confirm that Connexion is started.

I think this can be merged. When upgrading to Connexion 3 (#702) this functionality can or may need to be changed again.

NatLibFi / Annif

Optimize CLI startup time #696

Codecov Report