Open AdnanTheExcellent opened 1 year ago
Thanks for the detailed report and your proposal. 👍 I agree that there are a bunch of straightforward performance improvements we should make to this.
By itself, that's an extremely expensive sql query. If a store has hundreds of thousands, or millions of orders, this could take a VERY long time, or freeze the thread.
completed_orders
collection should not be loaded until we call find_each
on it, which will only load batches of 1000 records at a time.
We've confirmed this by testing if completed_orders.loaded?
is true in a Rails console, and that wasn't the case. Here's a screenshot showing the result of that check right above the complete_orders.find_each do |order|
line:
Either require the start_date or have a date in the admin for when Taxjar started.
The assumption we made is that if the start date is not set when are backfilling transactions, the user wants to backfill all existing orders. For anyone who wants to narrow the set of orders, the expectation is that they will set the start and the end dates for backfilling.
Either require the start_date or have a date in the admin for when Taxjar started.
The assumption we made is that if the start date is not set when are backfilling transactions, the user wants to backfill all existing orders. For anyone who wants to narrow the set of orders, the expectation is that they will set the start and the end dates for backfilling.
You're right, if someone gives a start date. However, Taxjar only allows you to backfill up to the current calendar year or a few months for free. After that, you have to pay per year. It could be dangerous to offer an empty date as the default option. I would personally require to put a backfill date and set it as that so the user knows they have to make an explicit choice on the date.
I have been testing the backfills a bit over the last couple months on production and I have some thoughts on efficiency. Lets take a look at the code for the backfill batch job:
The first line:
complete_orders = ::Spree::Order.complete.where(shipment_state: 'shipped')
By itself, that's an extremely expensive sql query. If a store has hundreds of thousands, or millions of orders, this could take a VERY long time, or freeze the thread. This has happened to me. There needs to be some lower bound to the date. Either require the
start_date
or have a date in the admin for when Taxjar started. For instance, we've been using spree / solidus since Dec 2010, but started using Taxjar April 2021. We backfilled to Feb 2021 when we started. A good default date would be Feb 2021 for thetaxjar_start_date
. That would prevent 10+ years of orders from getting queried.next if order.taxjar_order_transactions.any?
this is an N+1 query since it fetches
taxjar_order_transactions
for each order in the loop. This can be resolved with anincludes
andwhere
what I propose:
thoughts?