fatkodima / sidekiq-iteration

Make your long-running sidekiq jobs interruptible and resumable.
https://rubydoc.info/gems/sidekiq-iteration
MIT License
270 stars 8 forks source link

Can I use sidekiq-iteration in the job that read remote file? #4

Closed remy727 closed 1 year ago

remy727 commented 1 year ago
# frozen_string_literal: true

require "open-uri"

class BulkOperationDataRetrieveJob
  include Sidekiq::Job

  sidekiq_options queue: :bulk_operation_data_retrieve, retry: false

  def perform(shop_domain, url)
    shop = Shop.find_by(shopify_domain: shop_domain)

    if shop.nil?
      logger.error("#{self.class} failed: cannot find shop with domain '#{shop_domain}'")
      return
    end

    read_file(url)
  end

  private
    def read_file(url)
      file_path = "tmp/customers.jsonl"
      IO.copy_stream(URI.open(url), file_path)

      # Parse data file
      File.open(file_path) do |f|
        f.each do |line|
          process_line(JSON.parse(line))
        end
      end
    end

    def process_line(line)
      shopify_customer_id = line["id"].gsub("gid://shopify/Customer/", "").to_i
      shop.shopify_customers.find_or_create_by(shopify_id: shopify_customer_id) do |customer|
        customer.email = line["email"]
        customer.phone = line["phone"]
        customer.amount_spent = line["amountSpent"]["amount"].to_f
      end
    end
end

I have the above job and the remote file contains 100K customers. Can I use sidekiq-iteration in this job?

fatkodima commented 1 year ago

Sure, but you need to figure out how to write a custom cursor for this (https://github.com/fatkodima/sidekiq-iteration/blob/master/guides/custom-enumerator.md), which is a hard part. One of the (dumb?) solutions is to download the file locally, parse it, push some ids into the redis list, write a custom redis enumerator to iterate over this list and use this enumerator in the job.

There was a similar discussion (https://github.com/Shopify/job-iteration/issues/50) in the parent gem before.

remy727 commented 1 year ago

Sorry for the late reply.

Got it. But Sidekiq jobs run on Heroku and there's no way to guarantee that downloaded files (for example tmp/files) would exist.

remy727 commented 1 year ago

I fixed this by building a custom iterator. Thank you!

fatkodima commented 1 year ago

@remy727 Can you please share the approach you finally decided to use or the iterator's code? So this would be helpful for future seekers or maybe be incorporated into the gem in the future.