cebe / php-openapi

Read and write OpenAPI yaml/json files and make the content accessible in PHP objects.
MIT License
472 stars 92 forks source link

Major Performance issue even with new caching? #156

Open kohlerdominik opened 2 years ago

kohlerdominik commented 2 years ago

Hi @cebe

Almost 2 years ago we had a rather big issue with the performance for parsing an API-spec, where all components were store in another (the same) file. It took a couple of minutes to process. My collegue then proposed this PR #65

We ended up using this fork for the last two years now. This week I'm updating dependencies, and having seen that you merged #67 made me realize, that I can finally get rid of our fork.

However, running our test-suite after the update was rather painful: Instead of ~1s to parse our specification, like with our fork, it now took ~30s. I spent a couple of hours analyzing, but I couldn't find the issue. It looks like resolving references is painfully slow, it seems to take 100-300ms for each reference. Clearly, this was much faster in version 1.4, where our fork originates.


Therefore, I hope you can find a minute to look into it as well. I prepared a minimal test setup:

$ git clone https://github.com/lextira/openapi-parse-test
$ composer update

# now we have a composer install with our fork, see https://github.com/lextira/openapi-parse-test/blob/master/composer.json
$ php test.php
Start Reading...
Read took 1248 miliseconds!
Parsed file is 791837 characters long.

$ git checkout with-openapi-1.6
$ composer update

# new we are at the latest release of the official repository
$ php test.php
Start Reading...
Read took 37302 miliseconds!
Parsed File is 791837 characters long.

So with v1.6, reading the exact same file from the exact same URL takes almost 30 times longer than with v1.4 (and our custom cache implementation, which is very similar to your implemented cache now). For context, the specification used for the test is our official API documentation and can be found here: https://docs.lexgate.ch/api/

cebe commented 2 years ago

Thanks for the feedback. I also have performance issues on a big project of mine but I had no time to look into this yet. The fact that it takes much less time on your fork is some valuable info . I'll check that out.

kohlerdominik commented 2 years ago

Hi @cebe

Looks like the fix from @marcelthole makes things significantely faster here. Could you already have a look at it?

However, @marcelthole your fix introducing caching for references. But could you also find out, WHY resolving references is now so much slower than in 1.4?

marcelthole commented 2 years ago

not exactly. If i look into the diff one part is the huge amount of ReferenceContext::reduceDots calls in the current version.

Here the difference of the xdebug trace grafik

PatchRanger commented 1 year ago

I confirm performance issues. I have studied the code and have a suggestion: is it possible, in addition to the built-in ReferenceContextCache, to have a fallback to the user cache? Then the results of parsing and resolving $ref will become available between library calls - and this answers exactly the reason that creates performance problems, right? I mean, it's not enough to have a $ref-cache INSIDE one library call, it's necessary that the results are shared BETWEEN calls - and this is achieved by relying on a custom cache. In my opinion, it should be possible to pass to the new ReferenceContext user cache (for example, PSR-6 CacheItemPoolInterface). I suppose if a custom cache is passed to each ReferenceContext constructor, then this should solve problems with the performance of parsing and resolving $ref BETWEEN calls.

UPD I found that in my case, performance problems were caused by frequent cache misses due to https://gitlab.com/yii2-extended/yii2-psr6-cache-bridge/-/issues/3 . Therefore, please consider all mine of the above as an abstract feature suggestion, not a bug report.

NickSdot commented 1 year ago

Hi all.

I went through all the performance PRs and issues here, but none of them (1, 2) was really fully right or was fully solving the issue for me.

So, I went ahead and created my own fork (partly based on other suggestions here). Parsing the huge Stripe OpenApi Spec is back to ~1 second.

Fork: https://github.com/cebe/php-openapi/compare/master...NickSdot:cebe-php-openapi:improve-performance

An important fact, one major performance hit – in my case – is not solely caused by cebe/php-openapi but by a bug in symfony/yaml which we depend on here. I am trying to get this fixed upstream. It sadly is still not merged, so I currently override the class in question in my project.

Parsing the Stripe spec in fact ran for ages and eventually failed. The actual cause was kinda hard to spot. This is, because the cebe/php-openapi (understandably) was not validating for this external misbehaviour. Hence, I want to especially highlight this change in my fork https://github.com/cebe/php-openapi/commit/b65633f37e41798b6620a5bc109b8e95c09eb73f

Maybe someone is interested to try my fork and see if it will solve your issues as well. I am keen to PR this here or to any fork. Just let me know.

cc: @cebe, @kohlerdominik, @marcelthole, @aribros

Cheers

NickSdot commented 1 year ago

Quick heads up that the Symfony bug I was referring to in my previous comment got merged: https://github.com/symfony/symfony/pull/52408