ddnexus / pagy

šŸ† The Best Pagination Ruby Gem šŸ„‡
https://ddnexus.github.io/pagy
MIT License
4.61k stars 411 forks source link

Bug: Pagy 8.4.0 is broken with MS SQL server #704

Closed erlingur closed 5 months ago

erlingur commented 5 months ago

šŸ‘€ Before submitting...

šŸ§ REQUIREMENTS

šŸ’¬ Description

Hi there, I noticed an issue that a very recent commit just introduced when using Pagy with a MS SQL Server. Passing a 0 into .limit() does not work with MS SQL server. It will give the following error: The number of rows provided for a FETCH clause must be greater then zero.

This is because MS SQL server does not support FETCH NEXT 0 ROWS ONLY.

Docs: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql?view=sql-server-ver16

Quote:

FETCH { FIRST | NEXT } { integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
Specifies the number of rows to return after the OFFSET clause has been processed.
The value can be an integer constant or expression that is greater than or equal to one.

Here is a Rackup file as requested that demonstrates the issue. I can't spin up a SQL server for you in this file, sorry. You can launch it with a Docker command: sudo docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Password.1" -p 1433:1433 --name sql1 --hostname sql1 -d mcr.microsoft.com/mssql/server:2022-latest

rails.ru:

# frozen_string_literal: true

# Starting point to reproduce rails related pagy issues

# DEV USAGE
#    pagy clone rails
#    pagy ./rails.ru

# URL
#    http://0.0.0.0:8000

# HELP
#    pagy -h

# DOC
#    https://ddnexus.github.io/pagy/playground/#2-rails-app

VERSION = '8.4.0'

# Gemfile
require 'bundler/inline'
require 'bundler'
Bundler.configure
gemfile(ENV['PAGY_INSTALL_BUNDLE'] == 'true') do
  source 'https://rubygems.org'
  gem 'oj'
  gem 'puma'
  gem 'rails'
  gem 'tiny_tds'
  gem 'activerecord-sqlserver-adapter'
end

# require 'rails/all'     # too much stuff
require 'action_controller/railtie'
require 'active_record'

OUTPUT = Rails.env.showcase? ? IO::NULL : $stdout

# Rails config
class PagyRails < Rails::Application # :nodoc:
  config.root = __dir__
  config.session_store :cookie_store, key: 'cookie_store_key'
  Rails.application.credentials.secret_key_base = 'absolute_secret'

  config.logger = Logger.new(OUTPUT)
  Rails.logger  = config.logger

  routes.draw do
    root to: 'comments#index'
    get '/javascript' => 'pagy#javascript'
  end
end

# AR config
dir = Rails.env.development? ? '.' : Dir.pwd  # app dir in dev or pwd otherwise
unless File.writable?(dir)
  warn "ERROR: directory #{dir.inspect} is not writable (the pagy-rails-app needs to create DB files)"
  exit 1
end

# Pagy initializer
require 'pagy/extras/pagy'
require 'pagy/extras/items'
require 'pagy/extras/overflow'
Pagy::DEFAULT[:size]     = [1, 4, 4, 1]
Pagy::DEFAULT[:items]    = 10
Pagy::DEFAULT[:overflow] = :empty_page
Pagy::DEFAULT.freeze

connection_options = {adapter: 'sqlserver', username: "sa", password: "Password.1", dataserver: "localhost"}
ActiveRecord::Base.establish_connection(connection_options)
database = 'pagy_repro'
begin
  ActiveRecord::Base.connection.create_database(database)
rescue ActiveRecord::StatementInvalid
  # already exists ...
end
ActiveRecord::Base.establish_connection(connection_options.merge(database: database))

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string :title
  end

  create_table :comments, force: true do |t|
    t.string :body
    t.integer :post_id
  end
end

# Models
class Post < ActiveRecord::Base # :nodoc:
  has_many :comments
end # :nodoc:

class Comment < ActiveRecord::Base # :nodoc:
  belongs_to :post
end # :nodoc:

# Don't seed, it needs to have 0 records to reproduce the issue
# 1.upto(11) do |pi|
#   Post.create(title: "Post #{pi + 1}")
# end
# Post.all.each_with_index do |post, pi|
#   2.times { |ci| Comment.create(post:, body: "Comment #{ci + 1} to Post #{pi + 1}") }
# end

# Down here to avoid logging the DB seed above at each restart
ActiveRecord::Base.logger = Logger.new(OUTPUT)

# Helpers
module CommentsHelper
  include Pagy::Frontend
end

# Controllers
class CommentsController < ActionController::Base # :nodoc:
  include Rails.application.routes.url_helpers
  include Pagy::Backend

  def index
    @pagy, @comments = pagy(Comment.all)
    render inline: TEMPLATE
  end
end

# You don't need this in real rails apps (see https://ddnexus.github.io/pagy/docs/api/javascript/setup/#2-configure)
class PagyController < ActionController::Base
  def javascript
    file = "pagy#{'-dev' if ENV['DEBUG']}.js"
    render js: Pagy.root.join('javascripts', file).read
  end
end

run PagyRails

TEMPLATE = <<~ERB
  <!DOCTYPE html>
  <html lang="en">
    <html>
    <head>
    <title>Pagy Rails App</title>
      <script src="/javascript"></script>
      <script>
        window.addEventListener("load", Pagy.init);
      </script>
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      <style type="text/css">
        @media screen { html, body {
          font-size: 1rem;
          line-heigth: 1.2s;
          padding: 0;
          margin: 0;
        } }
        body {
          background: white !important;
          margin: 0 !important;
          font-family: sans-serif !important;
        }
        .content {
          padding: 1rem 1.5rem 2rem !important;
        }

        /* Quick demo for overriding the element style attribute of certain pagy helpers
        .pagy input[style] {
          width: 5rem !important;
        }
        */

        /*
          If you want to customize the style,
          please replace the line below with the actual file content
        */
        <%== Pagy.root.join('stylesheets', 'pagy.css').read %>
      </style>
    </head>

    <body>

      <div class="content">
        <h1>Pagy Rails App</h1>
        <p> Self-contained, standalone Rails app usable to easily reproduce any rails related pagy issue.</p>
        <p>Please, report the following versions in any new issue.</p>
        <h2>Versions</h2>
        <ul>
          <li>Ruby:  <%== RUBY_VERSION %></li>
          <li>Rack:  <%== Rack::RELEASE %></li>
          <li>Rails: <%== Rails.version %></li>
          <li>Pagy:  <%== Pagy::VERSION %></li>
        </ul>

        <h3>Collection</h3>
        <div id="records" class="collection">
        <% @comments.each do |comment| %>
          <p style="margin: 0;"><%= comment.body %></p>
        <% end %>
        </div>
        <hr>

        <h4>pagy_nav</h4>
        <%== pagy_nav(@pagy, id: 'nav', aria_label: 'Pages nav') %>

        <h4>pagy_nav_js</h4>
        <%== pagy_nav_js(@pagy, id: 'nav-js', aria_label: 'Pages nav_js') %>

        <h4>pagy_combo_nav_js</h4>
        <%== pagy_combo_nav_js(@pagy, id: 'combo-nav-js', aria_label: 'Pages combo_nav_js') %>

        <h4>pagy_items_selector_js</h4>
        <%== pagy_items_selector_js(@pagy, id: 'items-selector-js') %>

        <h4>pagy_info</h4>
        <%== pagy_info(@pagy, id: 'pagy-info') %>
      </div>

    </body>
  </html>
ERB

I've tried my best to make this as easily reproducible but because of the nature of the issue this is the best I could come up with. Hope it works.

benkoshy commented 5 months ago

Thanks for your report.

I was trying to reproduce it with your chosen database and with the command specified:

`Database 'pagy_repro' does not exist. Make sure that the name is entered correctly. (TinyTds::Error)```

if you have the code handy, adding it in the above lines would help inordinately?

ddnexus commented 5 months ago

I've tried my best to make this as easily reproducible but because of the nature of the issue this is the best I could come up with.

You certainly did @erlingur ! Thank you for your report.

That commits should have improved the performance of a probably quite rare case, and it has been added thinking that it wouldn't hurt anything else.

Unluckily that is almost never the case šŸ¤·.

At worse we will switch back to pass the items and add a troubleshooting entry explaining how to improve the performance by overriding, only when really needed.

BTW, I didn't try to run your command, so you may still want to help @benkoshy to reproduce it.

Thank you

ddnexus commented 5 months ago

@jules-w2

jules-w2 commented 5 months ago

Hey @ddnexus

I'm not sur it's a "rare" case. the case occurs on the last page when there are millions of entries in the database and the "limit" is greater than the results which remain to be displayed

I think it would be easy to solve all cases with:

collection.offset(pagy.offset).limit(pagy.in >Ā 0 ? pagy.in :  pagy.items )

Bests

ddnexus commented 5 months ago

@joules-w2 about the rarity of the case: in practice we have no idea one way or another.

But my educated guess would suggest that it could be your own setup, app, DB type or settings or query, in combination with who knows what... even a bug in the particular DB version/driver/ORM. Because one thing is certain: there have been no other complaint EVER in 7 years so far.

And when there are so many possible combinations, no other complaints ever, no objective reproducible culprit that we can pin down, I am comfortable to consider that it IS a rare case until proven common.

So adding a condition for every request, of every app, forever! and without any objective measurable reason nor proof, also considering that the benefit is only to improve the performance of the very last page in DBs with zillions of records and the right condition and star alignments... it's quite against my religion šŸ¤·

Of course, if you can prove that the change is worth it, I will be happy to change my mind.

erlingur commented 5 months ago

@benkoshy @ddnexus Thank you for taking a look! I've update the rails.ru file so it should now create the database for you if it doesn't exist. It works on my machine so hopefully it should work on yours :)

jules-w2 commented 5 months ago

@ddnexus

as soon as I find free time, I will look in depth at the origin of the slowness to understand exactly what is happening

benkoshy commented 5 months ago

Thank you for taking a look! I've update the rails.ru file so it should now create the database for you if it doesn't exist. It works on my machine so hopefully it should work on yours :)

Excellent: I can confirm reproduction of the issue.

DEBUG -- :    (1.5ms)  IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION
INFO -- : Started GET "/" for 127.0.0.1 at 2024-05-24 06:06:23 +1000
DEBUG -- :   Comment Count (1.4ms)  SELECT COUNT(*) FROM [comments]
DEBUG -- :   Comment Load (2.5ms)  EXEC sp_executesql N'SELECT [comments].* FROM [comments] ORDER BY [comments].[id] ASC OFFSET @0 ROWS FETCH NEXT @1 ROWS ONLY', N'@0 int, @1 int', @0 = 0, @1 = 0  [["OFFSET", nil], ["LIMIT", nil]]
F, [2024-05-24T06:06:23.977388 #405730] FATAL -- :   
ActionView::Template::Error (TinyTds::Error: The number of rows provided for a FETCH clause must be greater then zero.):
    54: 
    55:       <h3>Collection</h3>
    56:       <div id="records" class="collection">
    57:       <% @comments.each do |comment| %>
    58:         <p style="margin: 0;"><%= comment.body %></p>
    59:       <% end %>
    60:       </div>
ddnexus commented 5 months ago

Thank you @benkoshy

In order to avoid any surprise, pease could you check also the other way around in the same environment, i.e. changing pagy.in back to pagy.items and trying the exact same query?

benkoshy commented 5 months ago

Thank you @benkoshy

In order to avoid any surprise, pease could you check also the other way around in the same environment, i.e. changing pagy.in back to pagy.items and trying the exact same query?

As correctly predicted: using items simply works: even though there are no items in the collection. (I tested via overridding in the script rather than referencing a fork of the gem with pagy.items)

# Controllers
class CommentsController < ActionController::Base # :nodoc:
  include Rails.application.routes.url_helpers
  include Pagy::Backend

  def index
    @pagy, @comments = pagy(Comment.all)
    render inline: TEMPLATE
  end

    def pagy_get_items(collection, pagy)
      collection.offset(pagy.offset).limit(pagy.items)      
    end
end

image

DEBUG -- :   Comment Count (2.2ms)  SELECT COUNT(*) FROM [comments]
DEBUG -- :   Comment Load (2.8ms)  EXEC sp_executesql N'SELECT [comments].* FROM [comments] ORDER BY [comments].[id] ASC OFFSET @0 ROWS FETCH NEXT @1 ROWS ONLY', N'@0 int, @1 int', @0 = 0, @1 = 10  [["OFFSET", nil], ["LIMIT", nil]]
127.0.0.1 - - [24/May/2024:08:51:54 +1000] "GET / HTTP/1.1" 200 - 6.4252
I, [2024-05-24T08:51:54.192835 #440730]  INFO -- : Started GET "/javascript" for 127.0.0.1 at 2024-05-24 08:51:54 +1000
127.0.0.1 - - [24/May/2024:08:51:54 +1000] "GET /javascript HTTP/1.1" 200 - 0.0550
ddnexus commented 5 months ago

Thank you @benkoshy.

At this point, while waiting to know more about the actual cause of the last page delay in @jules-w2 case, we should go back to pagy.items.

We should also create a new entry in the troubleshooting page, referencing the case and the overriding.

jules-w2 commented 5 months ago

Hey @ddnexus

After further investigation, I found that this issue does not originate from Rails, Ransack, Pagy, etc., but directly from MySQL. You should be able to reproduce it on your side easily if you have a table with a large amount of data.

I am working with MariaDB 11.2.3:

SELECT VERSION();
+----------------+
| VERSION()      |
+----------------+
| 11.2.3-MariaDB |
+----------------+
1 row in set (0.000 sec)

I have 14,553 results for my query:

SELECT COUNT(*) FROM `aws_emails` WHERE (`aws_emails`.`event_type` = 'Delivery' AND `aws_emails`.`created_at` >= '2024-05-01 04:00:00');
+----------+
| COUNT(*) |
+----------+
|    14553 |
+----------+
1 row in set (0.380 sec)

With limit = items.in (in this case 3)

SELECT DISTINCT * FROM `aws_emails` WHERE (`aws_emails`.`event_type` = 'Delivery' AND `aws_emails`.`created_at` >= '2024-05-01 04:00:00') ORDER BY `aws_emails`.`id` DESC LIMIT 3 OFFSET 14550;
3 rows in set (0.377 sec)

The same code with limit 4

SELECT DISTINCT * FROM `aws_emails` WHERE (`aws_emails`.`event_type` = 'Delivery' AND `aws_emails`.`created_at` >= '2024-05-01 04:00:00') ORDER BY `aws_emails`.`id` DESC LIMIT 4 OFFSET 14550;
3 rows in set (51.727 sec)

@ddnexus are you able to reproduce this case ?

jules-w2 commented 5 months ago

I found the reason :

"MariaDB has to find all [OFFSET+LIMIT] rows, step over the first [OFFSET], then deliver the [LIMIT] for that distant page."

https://emmer.dev/blog/the-dangers-of-offset-with-mysql https://mariadb.com/kb/en/pagination-optimization/#the-problem https://planetscale.com/blog/mysql-pagination https://github.com/planetscale/fast_page?tab=readme-ov-file#pagy

ddnexus commented 5 months ago

@jules-w2 great references!

I preliminarily read it quickly and skipping around, and I may have missed something, but your specific case seems to be different (and more extreme) than the usual known progressive performance hit on large tables.

You have a huge difference only on the last page that doesn't seem to be related to the OFFSET as much as related to the LIMIT.

I see the charts in the first link you posted and that is very much a constant increase, related to the need to skip over a huge number of rows (OFFSET), not a huge spike on the very last page, which in your case can be totally removed with the right LIMIT.

Did I miss something?

BTW, could you try the performance by following https://ddnexus.github.io/pagy/docs/how-to/#consider-the-arel-extra-and-or-the-fast-page-gem ?

ddnexus commented 5 months ago

@jules-w2 please could you try the query on the last page with the usual OFFSET but without the LIMIT?

And just a confirmation: are you running the query in the MySQL console?

jules-w2 commented 5 months ago

@ddnexus

The problem when looking for [OFFSET+LIMIT] is that if [OFFSET + LIMIT] is greater than the total number of rows, MySQL will look for rows that do not exist.. (and we must enter internal timeouts of MySQL)

This is the typical case which arrives on the last page.

I run the query in MySQL console just to understand the problem. In practice I use ActiveRecord + Ransasck + Pagy.

I will test the 2 recommendations and I will come back to you with the results

ddnexus commented 5 months ago

MySQL console just to understand the problem

and that is perfect because it excludes any possible problem with the rest of the environment

I will test the 2 recommendations and I will come back to you with the results

They are together but I think that fastpage is probably the one that could make some difference, but I am just guessing. It would be wise to try one, the other and both together (if possible).

ddnexus commented 5 months ago

MySQL will look for rows that do not exist..

A couple of things to unpack here:

  1. I didn't find any description of any (OFFSET+LIMIT) > COUNT problem reaching "rows that do not exist" in any of your links. Did I miss something? They all describe just the usual performance hit with skipping over many rows, so I guess that explaining your case with "looking for rows that don't exist" is your own deduction (which seems reasonable, but yet unconfirmed).

  2. It would be very "out of character" for a DB to be so dumb and unaware of the count of a table to get stuck on the last few rows for 1 minute, searching for rows that don't exist. Usually a DB should be pretty resilient to what a query may ask for, especially about data that "don't exist". I am starting to believe that it might be a DB bug.

Whatever is the cause of the problem, at this point it seems quite established that it's external to ruby and pagy, so documenting any useful workaround should be more than enough for the pagy scope.