Closed ahsandar closed 4 years ago
Hi @ahsandar . Actually I couldn't get the issue from your description. What's the problem with the query? Is something doesn't work or how?
this change of order by created_at seems to be causing the sort issue.
What is "sort issue" ? Long query? Wrong results? Memory / CPU bloat? :thinking:
If I would check process list
SELECT
*
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE command = 'Query'
ORDER BY db, time
it would show that DB is performing creating sort index
. while running the query
SELECT `oauth_access_tokens`.*
FROM `oauth_access_tokens`
WHERE `oauth_access_tokens`.`application_id` = 67 AND `oauth_access_tokens`.`resource_owner_id` IS NULL AND `oauth_access_tokens`.`revoked_at` IS NULL
ORDER BY `oauth_access_tokens`.`id` ASC
LIMIT 1000
already cleaned up the DB of old tokens to reduce the table size but this query seems to take too long causing high CPU usage if there are many records for a particular application_id
. Should there be a created_at
index on the oauth_access_tokens
for doorkeeper to run it efficiently as there are no mentions of it in change log and not part of default migrations either ?
It's for applications page, right? Or for something else?
For applications page you can yep, either add an index, or implement your own admin panel with any queuering you need.
Its for an api consumed by 3rd parties and JWT are issued to them for usage and they tend to generate new tokens without respecting to use issued token till expiry which causes multiple records valid in the table. Will try to add an index and update if that helps
the creating sort index
queries are now short lived after creating index on created_at
but the amount of data per query is too much and DB is stuck in writing to net
state causing high CPU which according to [MySQL docs](https://dev.mysql.com/doc/refman/5.7/en/general-thread-states.html
The server is writing a packet to the client. This state is called Writing to net prior to MySQL 5.7.8
This means anyone can abuse the oauth token and overload the DB
the difference between 4.2.5 and 5.2.0 for token lookup from the code is with legacy support reutrns 1
with use of first
so 4.2.5 was most likley returnung just 1
where as the current returns a list as per SQL query with a limit 1000
. There could be reasoning behind it but it does create a cascading problem
# Looking for not revoked Access Token records that belongs to specific
# Application and Resource Owner.
#
# @param application_id [Integer]
# ID of the Application model instance
# @param resource_owner_id [Integer]
# ID of the Resource Owner model instance
#
# @return [Doorkeeper::AccessToken] array of matching AccessToken objects
#
def authorized_tokens_for(application_id, resource_owner_id)
ordered_by(:created_at, :desc)
.where(application_id: application_id,
resource_owner_id: resource_owner_id,
revoked_at: nil)
end
# Convenience method for backwards-compatibility, return the last
# matching token for the given Application and Resource Owner.
#
# @param application_id [Integer]
# ID of the Application model instance
# @param resource_owner_id [Integer]
# ID of the Resource Owner model instance
#
# @return [Doorkeeper::AccessToken, nil] matching AccessToken object or
# nil if nothing was found
#
def last_authorized_token_for(application_id, resource_owner_id)
authorized_tokens_for(application_id, resource_owner_id).first
end
could this be causing the issue ?
its not in 5.1.0 will try that out and see if it works
just downgraded to 5.1.0 and seems the issue vanishes don't see it happening for now. will monitor and update if it happens but seems the regression is a 5.2.0 issue
I don't see here any created_at
SELECT `oauth_access_tokens`.*
FROM `oauth_access_tokens`
WHERE `oauth_access_tokens`.`application_id` = 67 AND `oauth_access_tokens`.`resource_owner_id` IS NULL AND `oauth_access_tokens`.`revoked_at` IS NULL
ORDER BY `oauth_access_tokens`.`id` ASC
LIMIT 1000
So why do we talk about adding index to it?
Also here (https://github.com/doorkeeper-gem/doorkeeper/blob/v5.2.1/lib/doorkeeper/models/access_token_mixin.rb#L89) we have sort by created_at
, but it's done using Ruby, not database.
And the goal of this changes was to minimize memory pollution, because before it just loads ALL the tokens from the database. You can check #1193
alright i thought some additional query uses created_at
as that was a change in 4.3.0 and we were on 4.2.5, for now 5.1.0 is working fine compared to 5.2.0. if 5.2.1 has fixed the memory bloat will give a try, thanks for the update
We had the same issue. We updated the Gem to 5.2.1
and the end-point /token
is taking too long to finish, more than 40 seconds, so we downgraded it to 5.1.0
and it fixed the issue.
The issue seems to only happen when having many oauth_access_tokens
records, our table has 1793120
records.
Checking the changelog the issue seems related to it: https://github.com/doorkeeper-gem/doorkeeper/pull/1270/files#diff-c42935829979fd1bcb5ed418c92f6386R93 but I couldn't confirm it.
I just added a comment to https://github.com/doorkeeper-gem/doorkeeper/pull/1271, but I think I can explain what happens:
Previously when fetching a token through client credentials flow, we would just use find_or_create_by
, which is pretty quick. See code here.
In 5.2, we always check for an existing token first, which essentially calls the method matching_token_for
mentioned above.
matching_token_for
will just be very slow if your existing database contains lots of unrevoked access tokens.
So upgrading from pre-5.2 to 5.2 is not a problem per se, but it is one, if your database already contains a large amount of unrevoked tokens (busy production databases using pre-5.2 doorkeeper will have that).
Does that make sense?
I think a possible workaround might be to run rake doorkeeper:db:cleanup
, which should delete most of the "revokable" access tokens because of their expiry time.
An additional note regarding that workaround: You should run that rake task with at least doorkeeper 5.1, since before it would lock up your database if it contained large amounts of tokens (my bad :( ).
@NobodysNightmare matching_token_for
might be the culprit but doing a db cleanup won't be solution for every case as I mentioned above for when we can't block/ratelimit the token issuing due to reasons beyond our control and one could overwhelm the system by generating too many tokens with validity period increasing with every token and the query time could increase exponentially with growth of the records.
for now 5.1.0 seems to be the solution for us to stick to to work it with rails 5.2.3.
thanks for the explanation around the issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@ahsandar @NobodysNightmare added token_lookup_batch_size
config option, now you can increase default value to any size you need (prior 5.2 Doorkeeper retrieved ALL the records from the database, then in batches of 1000 (and this cause this issue to be opened), and now it's 10 000 by defauls and is configurable)
If you need the behavior that was before Doorkeeper 5.2 - increase the value to 100_000 or higher.
Introduced in #1360 and would be backported to 5.2 , 5.3
UPD: Backported to 5.2.4 and 5.3.1, both released
tried with token_lookup_batch_size
1_000_000
though it is better than when the issue was reported it is still not really working out with P90 P95 averaging at over 10second
this takes over 500ms to run
SELECT oauth_access_tokens . * FROM oauth_access_tokens WHERE oauth_access_tokens . application_id = ? AND oauth_access_tokens . resource_owner_id IS ? AND oauth_access_tokens . revoked_at IS ? ORDER BY oauth_access_tokens . id ASC LIMIT ?
is there anything else to do to resolve this otherwise will stick with 5.1.0 . also Increasing this limit is causing high memory issue which never happens on 5.1.0.
The endpoint that spikes is 401 response to oauth/token
I can suggest only few options @ahsandar :
AccessToken
model and implement lookup yourself without batching
matching_token_for
(take a look at the version before "regression") - in this case no need to patch@nbulaj
Add database indexes for the query to make it more performant (also you could provide a PR for this into the gem)
Already added index on created_at
in oauth_access_token
table and it didn't help mentioned that earlier on the issue
Do a cleanup with cron or smth else to decrease the number of records (drop revoked and old tokens)
Already have this in place
Patch an AccessToken model and implement lookup yourself without batching Or upgrade to Doorkeeper 5.3/5.4 and configure custom Access Token class and override matching_token_for (take a look at the version before "regression") - in this case no need to patch
Last hope will give this a try when there is no option left to stay on 5.1.0
Thanks for the assistance
Hey @ahsandar, we were suffering the same pain and decided to move to 5.3.x with the idea of using a custom Access Token class as @nbulaj suggested. However, when doing this change, we realized that just doing the upgrade in our case was enough.
The #1353 PR changed the Client Credentials creation and it only does the expensive finding when the reuse_access_token
or revoke_previous_client_credentials_token
options are enabled. I hope that this could help you as well.
Thanks for all the info that you both shared here, was really useful to debug and find the problem 😸
Thanks @dsantosmerino for the update will give it a try 🤞
Thought the latency issue seems to be no more but the memory consumption is twice as much now using 5.4.0
we even have jemalloc but it makes no difference
also has high erratic CPU usage
This should fix the issue - #1542
@ahsandar is there any chance you could test it for your case?
@nbulaj will test it out this month and update. Thanks
@nbulaj its not relaesed yet ? I can't verify it as the issue is visible in production only not staging and can't test it out there to verify unless in a release.
Yeah, not released yet :disappointed:
@nbulaj i see the PR is merged is it now part of the latest release ?
It;s not released yet @ahsandar
@ahsandar released with the 5.6.0.rc1 version
will check the changelog and try to give this a try asap, thanks @nbulaj
Recently upgraded doorkeeper to 5.2.0 from 4.2.5 and it causes SQL issues with queries for oauth_access_token stuck in creating sort index for user that query to many tokens or request a token along every request and overwhelm the system by token generation. Along Doorkeeper we have moved to rails 5.2.3 hence the upgrade to doorkeeper. We are using Devise with doorkeeper and Devise got upgraded from 4.3.0 to 4.7.1. We are using doorkeeper-jwt also.
Doorkeeper 4.2.5 seems to work fine. Can't trace where the following query would be running from in Doorkeeper for 5.2.0 where as it is not the case when using 4.2.5 but I suppose it is to do with changes from
4.3.0
this change of order by
created_at
seems to be causing the sort issue. I might be wrong but this is most I could dig up.There are no recommendations on adding indexes on created_at in change log, so assuming that is not expected .
Expected behavior
Tell us what should happen Should not get stuck in
creating sort index
for fetchingoauth_acces_token
Actual behavior
Tell us what happens instead Gets stuck in
creating sort index
for client that abuse creating token endpoint as the queries pile up and the result is effected as table size grows.System configuration
You can help us to understand your problem if you will share some very useful information about your project environment (don't forget to remove any confidential data if it exists).
Doorkeeper initializer:
Ruby version:
2.6
Gemfile.lock:
Show me
``` GEM remote: https://rubygems.org/ specs: actioncable (5.2.3) actionpack (= 5.2.3) nio4r (~> 2.0) websocket-driver (>= 0.6.1) actionmailer (5.2.3) actionpack (= 5.2.3) actionview (= 5.2.3) activejob (= 5.2.3) mail (~> 2.5, >= 2.5.4) rails-dom-testing (~> 2.0) actionpack (5.2.3) actionview (= 5.2.3) activesupport (= 5.2.3) rack (~> 2.0) rack-test (>= 0.6.3) rails-dom-testing (~> 2.0) rails-html-sanitizer (~> 1.0, >= 1.0.2) actionview (5.2.3) activesupport (= 5.2.3) builder (~> 3.1) erubi (~> 1.4) rails-dom-testing (~> 2.0) rails-html-sanitizer (~> 1.0, >= 1.0.3) active_model_serializers (0.10.10) actionpack (>= 4.1, < 6.1) activemodel (>= 4.1, < 6.1) case_transform (>= 0.2) jsonapi-renderer (>= 0.1.1.beta1, < 0.3) activejob (5.2.3) activesupport (= 5.2.3) globalid (>= 0.3.6) activemodel (5.2.3) activesupport (= 5.2.3) activerecord (5.2.3) activemodel (= 5.2.3) activesupport (= 5.2.3) arel (>= 9.0) activestorage (5.2.3) actionpack (= 5.2.3) activerecord (= 5.2.3) marcel (~> 0.3.1) activesupport (5.2.3) concurrent-ruby (~> 1.0, >= 1.0.2) i18n (>= 0.7, < 2) minitest (~> 5.1) tzinfo (~> 1.1) addressable (2.6.0) public_suffix (>= 2.0.2, < 4.0) aes_key_wrap (1.0.1) arel (9.0.0) ast (2.4.0) aws-eventstream (1.0.3) aws-partitions (1.195.0) aws-sdk-core (3.61.2) aws-eventstream (~> 1.0, >= 1.0.2) aws-partitions (~> 1.0) aws-sigv4 (~> 1.1) jmespath (~> 1.0) aws-sdk-sns (1.3.0) aws-sdk-core (~> 3) aws-sigv4 (~> 1.0) aws-sigv4 (1.1.0) aws-eventstream (~> 1.0, >= 1.0.2) bcrypt (3.1.13) bindata (2.4.4) bootsnap (1.4.4) msgpack (~> 1.0) builder (3.2.3) byebug (11.0.1) capistrano (3.3.5) capistrano-stats (~> 1.1.0) i18n rake (>= 10.0.0) sshkit (~> 1.3) capistrano-bundler (1.6.0) capistrano (~> 3.1) capistrano-passenger (0.2.0) capistrano (~> 3.0) capistrano-secrets-yml (1.0.0) capistrano (>= 3.1) sshkit (>= 1.2.0) capistrano-stats (1.1.1) case_transform (0.2) activesupport chronic (0.10.2) coderay (1.1.2) concurrent-ruby (1.1.5) crass (1.0.4) database_cleaner (1.7.0) declarative (0.0.10) declarative-option (0.1.0) devise (4.7.1) bcrypt (~> 3.0) orm_adapter (~> 0.1) railties (>= 4.1.0) responders warden (~> 1.2.3) diff-lcs (1.3) docile (1.3.2) doorkeeper (5.2.0) railties (>= 5) doorkeeper-jwt (0.3.0) jwt (~> 2.1.0, >= 2.1.0) erubi (1.8.0) factory_bot (5.0.2) activesupport (>= 4.2.0) factory_bot_rails (5.0.2) factory_bot (~> 5.0.2) railties (>= 4.2.0) faker (1.9.6) i18n (>= 0.7) faraday (0.15.4) multipart-post (>= 1.2, < 3) faraday_middleware (0.13.1) faraday (>= 0.7.4, < 1.0) ffi (1.11.1) geoip (1.6.4) geokit (1.13.1) geokit-rails (2.3.1) geokit (~> 1.5) rails (>= 3.0) globalid (0.4.2) activesupport (>= 4.2.0) google-api-client (0.30.8) addressable (~> 2.5, >= 2.5.1) googleauth (>= 0.5, < 0.10.0) httpclient (>= 2.8.1, < 3.0) mini_mime (~> 1.0) representable (~> 3.0) retriable (>= 2.0, < 4.0) signet (~> 0.10) google-cloud-bigquery (1.12.0) concurrent-ruby (~> 1.0) google-api-client (~> 0.23) google-cloud-core (~> 1.2) googleauth (>= 0.6.2, < 0.10.0) mime-types (~> 3.0) google-cloud-core (1.3.0) google-cloud-env (~> 1.0) google-cloud-env (1.2.0) faraday (~> 0.11) googleauth (0.8.1) faraday (~> 0.12) jwt (>= 1.4, < 3.0) memoist (~> 0.16) multi_json (~> 1.11) os (>= 0.9, < 2.0) signet (~> 0.7) her (1.1.0) activemodel (>= 4.2.1) faraday (>= 0.8, < 1.0) multi_json (~> 1.7) httpclient (2.8.3) i18n (1.6.0) concurrent-ruby (~> 1.0) jaro_winkler (1.5.3) jmespath (1.4.0) json (2.2.0) json-jwt (1.10.2) activesupport (>= 4.2) aes_key_wrap bindata json_matchers (0.11.1) json_schema json_schema (0.20.7) jsonapi-renderer (0.2.2) jwt (2.1.0) kaminari (1.1.1) activesupport (>= 4.1.0) kaminari-actionview (= 1.1.1) kaminari-activerecord (= 1.1.1) kaminari-core (= 1.1.1) kaminari-actionview (1.1.1) actionview kaminari-core (= 1.1.1) kaminari-activerecord (1.1.1) activerecord kaminari-core (= 1.1.1) kaminari-core (1.1.1) koala (3.0.0) addressable faraday json (>= 1.8) listen (3.0.8) rb-fsevent (~> 0.9, >= 0.9.4) rb-inotify (~> 0.9, >= 0.9.7) lograge (0.11.2) actionpack (>= 4) activesupport (>= 4) railties (>= 4) request_store (~> 1.0) loofah (2.2.3) crass (~> 1.0.2) nokogiri (>= 1.5.9) lz4-ruby (0.3.3) mail (2.7.1) mini_mime (>= 0.1.1) marcel (0.3.3) mimemagic (~> 0.3.2) memoist (0.16.0) method_source (0.9.2) mime-types (3.2.2) mime-types-data (~> 3.2015) mime-types-data (3.2019.0331) mimemagic (0.3.3) mini_mime (1.0.2) mini_portile2 (2.4.0) minitest (5.11.3) monetize (1.9.2) money (~> 6.12) money (6.13.4) i18n (>= 0.6.4, <= 2) msgpack (1.3.0) multi_json (1.13.1) multipart-post (2.1.1) mysql2 (0.5.0) net-scp (2.0.0) net-ssh (>= 2.6.5, < 6.0.0) net-ssh (5.2.0) newrelic_rpm (6.5.0.357) nio4r (2.4.0) nokogiri (1.10.3) mini_portile2 (~> 2.4.0) orm_adapter (0.5.0) os (1.0.1) parallel (1.17.0) parser (2.6.3.0) ast (~> 2.4.0) pry (0.12.2) coderay (~> 1.1.0) method_source (~> 0.9.0) pry-rails (0.3.9) pry (>= 0.10.4) public_suffix (3.1.1) puma (4.0.1) nio4r (~> 2.0) rack (2.0.7) rack-cors (1.0.3) rack-test (1.1.0) rack (>= 1.0, < 3) rails (5.2.3) actioncable (= 5.2.3) actionmailer (= 5.2.3) actionpack (= 5.2.3) actionview (= 5.2.3) activejob (= 5.2.3) activemodel (= 5.2.3) activerecord (= 5.2.3) activestorage (= 5.2.3) activesupport (= 5.2.3) bundler (>= 1.3.0) railties (= 5.2.3) sprockets-rails (>= 2.0.0) rails-controller-testing (1.0.4) actionpack (>= 5.0.1.x) actionview (>= 5.0.1.x) activesupport (>= 5.0.1.x) rails-dom-testing (2.0.3) activesupport (>= 4.2.0) nokogiri (>= 1.6) rails-html-sanitizer (1.0.4) loofah (~> 2.2, >= 2.2.2) rails_param (0.10.2) railties (5.2.3) actionpack (= 5.2.3) activesupport (= 5.2.3) method_source rake (>= 0.8.7) thor (>= 0.19.0, < 2.0) rainbow (3.0.0) rake (12.3.3) rb-fsevent (0.10.3) rb-inotify (0.10.0) ffi (~> 1.0) rb-readline (0.5.5) redis (3.3.5) representable (3.0.4) declarative (< 0.1.0) declarative-option (< 0.2.0) uber (< 0.2.0) request_store (1.4.1) rack (>= 1.4) responders (3.0.0) actionpack (>= 5.0) railties (>= 5.0) retriable (3.1.2) rspec-collection_matchers (1.1.3) rspec-expectations (>= 2.99.0.beta1) rspec-core (3.8.2) rspec-support (~> 3.8.0) rspec-expectations (3.8.4) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.8.0) rspec-json_expectations (2.2.0) rspec-mocks (3.8.1) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.8.0) rspec-rails (3.8.2) actionpack (>= 3.0) activesupport (>= 3.0) railties (>= 3.0) rspec-core (~> 3.8.0) rspec-expectations (~> 3.8.0) rspec-mocks (~> 3.8.0) rspec-support (~> 3.8.0) rspec-support (3.8.2) rubocop (0.73.0) jaro_winkler (~> 1.5.1) parallel (~> 1.10) parser (>= 2.6) rainbow (>= 2.2.2, < 4.0) ruby-progressbar (~> 1.7) unicode-display_width (>= 1.4.0, < 1.7) ruby-progressbar (1.10.1) ruby_http_client (3.3.0) sendgrid-ruby (6.0.0) ruby_http_client (~> 3.3.0) sentry-raven (2.11.0) faraday (>= 0.7.6, < 1.0) signet (0.11.0) addressable (~> 2.3) faraday (~> 0.9) jwt (>= 1.5, < 3.0) multi_json (~> 1.10) simple_token_authentication (1.15.1) actionmailer (>= 3.2.6, < 6) actionpack (>= 3.2.6, < 6) devise (>= 3.2, < 6) simplecov (0.17.0) docile (~> 1.1) json (>= 1.8, < 3) simplecov-html (~> 0.10.0) simplecov-html (0.10.2) sprockets (3.7.2) concurrent-ruby (~> 1.0) rack (> 1, < 3) sprockets-rails (3.2.1) actionpack (>= 4.0) activesupport (>= 4.0) sprockets (>= 3.0.0) sshkit (1.19.1) net-scp (>= 1.1.2) net-ssh (>= 2.8.0) thor (0.20.3) thread_safe (0.3.6) tzinfo (1.2.5) thread_safe (~> 0.1) uber (0.1.0) unicode-display_width (1.6.0) versionist (2.0.1) activesupport (>= 3) railties (>= 3) yard (~> 0.9.20) warden (1.2.8) rack (>= 2.0.6) websocket-driver (0.7.1) websocket-extensions (>= 0.1.0) websocket-extensions (0.1.4) wego-money-bank (0.1.0) money (~> 6.9) whenever (1.0.0) chronic (>= 0.6.3) yard (0.9.20) PLATFORMS ruby DEPENDENCIES active_model_serializers (~> 0.10.10) bootsnap byebug cacheable! capistrano (~> 3.3.0) capistrano-bundler capistrano-passenger capistrano-secrets-yml (~> 1.0.0) database_cleaner devise doorkeeper doorkeeper-jwt factory_bot_rails faker faraday geoip geokit-rails globalize! google-cloud-bigquery jo_adapter! json-jwt json_matchers jwt kaminari koala listen (~> 3.0.5) lograge mysql2 (= 0.5.0) newrelic_rpm patrol! pry-rails puma rack-cors rails (~> 5.2.3) rails-controller-testing rails_param rb-readline rspec-collection_matchers rspec-json_expectations rspec-rails (~> 3.5) rubocop sendgrid-ruby sentry-raven simple_token_authentication simplecov versionist whenever BUNDLED WITH 1.17.2 ```