SuperGoodSoft / solidus_taxjar

Support for using TaxJar to handle tax calculations in Solidus
BSD 3-Clause "New" or "Revised" License
12 stars 12 forks source link

BackfillTransactionSyncBatchJob sql queries are inefficient #247

Open AdnanTheExcellent opened 1 year ago

AdnanTheExcellent commented 1 year ago

I have been testing the backfills a bit over the last couple months on production and I have some thoughts on efficiency. Lets take a look at the code for the backfill batch job:

  def perform(transaction_sync_batch)
    complete_orders = ::Spree::Order.complete.where(shipment_state: 'shipped')
    if transaction_sync_batch.start_date
      complete_orders = complete_orders.where("completed_at >= ?", transaction_sync_batch.start_date.beginning_of_day)
    end
    if transaction_sync_batch.end_date
      complete_orders = complete_orders.where("completed_at <= ?", transaction_sync_batch.end_date.end_of_day)
    end
    complete_orders.find_each do |order|
      next if order.taxjar_order_transactions.any?
      SuperGood::SolidusTaxjar::ReportTransactionJob.perform_later(order, transaction_sync_batch)
    end
  end

The first line:

complete_orders = ::Spree::Order.complete.where(shipment_state: 'shipped')

By itself, that's an extremely expensive sql query. If a store has hundreds of thousands, or millions of orders, this could take a VERY long time, or freeze the thread. This has happened to me. There needs to be some lower bound to the date. Either require the start_date or have a date in the admin for when Taxjar started. For instance, we've been using spree / solidus since Dec 2010, but started using Taxjar April 2021. We backfilled to Feb 2021 when we started. A good default date would be Feb 2021 for the taxjar_start_date. That would prevent 10+ years of orders from getting queried.

next if order.taxjar_order_transactions.any?

this is an N+1 query since it fetches taxjar_order_transactions for each order in the loop. This can be resolved with an includes and where

what I propose:

  # put a lower bound on the start_date calendar select = to SuperGood::SolidusTaxjar.configuration.taxjar_start_date
  start_date = if transaction_sync_batch.start_date
                 transaction_sync_batch.start_date.beginning_of_day
               else
                 SuperGood::SolidusTaxjar.configuration.taxjar_start_date
               end
  complete_orders = ::Spree::Order.complete.
    includes(:taxjar_order_transactions).
    where("taxjar_order_transactions.order_id": nil).
    where(shipment_state: 'shipped').
    where("completed_at >= ?", start_date)
  if transaction_sync_batch.end_date
    complete_orders = complete_orders.where("completed_at <= ?", transaction_sync_batch.end_date.end_of_day)
  end
  complete_orders.each do |order|
    SuperGood::SolidusTaxjar::ReportTransactionJob.perform_later(order, transaction_sync_batch)
  end

thoughts?

benjaminwil commented 1 year ago

Thanks for the detailed report and your proposal. 👍 I agree that there are a bunch of straightforward performance improvements we should make to this.

senemsoy commented 1 year ago

By itself, that's an extremely expensive sql query. If a store has hundreds of thousands, or millions of orders, this could take a VERY long time, or freeze the thread.

completed_orders collection should not be loaded until we call find_each on it, which will only load batches of 1000 records at a time.

We've confirmed this by testing if completed_orders.loaded? is true in a Rails console, and that wasn't the case. Here's a screenshot showing the result of that check right above the complete_orders.find_each do |order| line:

Screenshot 2023-10-25 at 10 49 32 AM

Either require the start_date or have a date in the admin for when Taxjar started.

The assumption we made is that if the start date is not set when are backfilling transactions, the user wants to backfill all existing orders. For anyone who wants to narrow the set of orders, the expectation is that they will set the start and the end dates for backfilling.

AdnanTheExcellent commented 1 year ago

Either require the start_date or have a date in the admin for when Taxjar started.

The assumption we made is that if the start date is not set when are backfilling transactions, the user wants to backfill all existing orders. For anyone who wants to narrow the set of orders, the expectation is that they will set the start and the end dates for backfilling.

You're right, if someone gives a start date. However, Taxjar only allows you to backfill up to the current calendar year or a few months for free. After that, you have to pay per year. It could be dangerous to offer an empty date as the default option. I would personally require to put a backfill date and set it as that so the user knows they have to make an explicit choice on the date.