ankane / or-tools-ruby

Operations research tools for Ruby
Apache License 2.0
174 stars 24 forks source link

Process crashing through webserver/sidekiq #21

Closed pmcnano closed 2 years ago

pmcnano commented 3 years ago

Hello @ankane thanks for the gem.

I started using the gem a month ago while trying to solve some heavy optimization processes we needed. Our current code can take up to 30+ minutes to optimize our problem, and using this we reduced that to seconds.

I worked to implement the new solution and everything works perfectly fine, I have specs with rspec, all was great until today. Today I tried to get all the env running and testing everything end to end.

Our code runs through a REST api, and I started getting stack level too deep errors, and some fatal crashes. The same exact code works perfectly through console, but it fails through the server, and I also tried sidekiq and had the same crash.

I couldn't think a better way to demostrate this than showing you an actual example. Here's a complete rails app, with the gem. https://share.getcloudapp.com/7Kuov9dv I also included a zip file with the fatal crashes I got at some point.

If you bundle the app and then open console, running: Test.run will yield you an array of results or Test.simple will yield :optimal

However, if you run the rails server and open it http://localhost:3000 you will get the stack level too deep error. As you can see, the TestController is running exactly the same thing Test.run.

If you open http://localhost:3000/simple it will work just fine. By this I am demostrating that I pinpointed exactly what is crashing the code, but it is difficult for me to debug this further as I can't replicate in console, and through the web server (using binding.pry I get super weird issues).

Test.run is basically rerunning the solver several times, but it is resetting the Solver object, as well as my caching hashes.

Another weird behaviour I noticed while debugging is, for instance, in the Test class, if you change Test#combinations and return in the first line return combine of course the server index will work now, as it's doing the same as Test.simple. However, if you, instead of return, add a byebug there, then open the index, the server will enter byebug and then call from console combine you will get the stack level too deep error.

I apologize if I missed something, or I gave you too much data, this is driving me crazy.

Thanks!

ankane commented 3 years ago

Hey @pmcnano, can you share the key parts inline?

pmcnano commented 3 years ago

Hey @pmcnano, can you share the key parts inline?

Hey, I'm sorry but what woudl key part be for you? That's the thing....the code works...it just doesn't through the web server. Since I don't have anything specific to share, that's why I shared an app, but I guess I'll try.

From the test.rb file:

def combinations
    (1..all_warehouses.map(&:transit_days).max).map do |days|
      reset_variables
      @warehouses = all_warehouses.select{|w| w.transit_days <= days}

      combine
    end
  end

That is allegedly what breaks the code, as running #combine directly works fine. However, as I explained before if you add a line like this:

def combinations
   return combine

    (1..all_warehouses.map(&:transit_days).max).map do |days|
      reset_variables
      @warehouses = all_warehouses.select{|w| w.transit_days <= days}

      combine
    end
  end

of course it works, as it's going straight to #combine, however if you debug inside:

def combinations
   byebug

    (1..all_warehouses.map(&:transit_days).max).map do |days|
      reset_variables
      @warehouses = all_warehouses.select{|w| w.transit_days <= days}

      combine
    end
  end

And then call #combine it also gives the error.

ankane commented 3 years ago

Hey @pmcnano, try updating to the latest version (0.5.3) to see if that fixes it. If not, a few useful things to see inline are:

  1. The stack trace of a stack level too deep error
  2. The stack trace/output of a crash
  3. The Test class
pmcnano commented 3 years ago

hey @ankane I will update, however, I apologize but I don't even remember what I did, but we are succesfully using this in production 😄 .

I will close this issue. thanks for your gem and your help! Have a wonderful week.

ankane commented 3 years ago

Good to hear. Have a great week as well.

pmcnano commented 2 years ago

Hey @ankane I am actually reopenning this because I found something new. I kinda ignored it because it was working perfectly in production and it hasn't affected me until now.

I just realized that the issue I'm experiencing it's only happening on MacOs, using the same test app I shared in the first post in ubuntu works perfectly fine.

I am not sure how to further debug this issue. Do you have a Mac? Are you able to try the sample app I posted in the OP?

Edit: Found more things.

If you look at the test class, the difference between #combine and #combinations it's just that we are running multiple times in combinations. However, I tried just running #combine inside a block, any block and it also crashes.

def combinations
  [1].map do |x|
    combine
  end
end

So it seems the issue is running the solve inside a block?

Edit2: More findings. I was curious why another part of my code was working fine. It turns out that running the code in a new thread works perfectly.

If in the TestController#run you change the data value to be Thread.new{Test.run}.value instead of just Test.run, it will work just fine.

I am super confused.

ankane commented 2 years ago

Hey @pmcnano, if you're able to create a minimal reproducible example (single file with the minimum amount of code to reproduce), I can spend some time on it. Feel free to create a new issue for it.

ankane commented 2 years ago

Just fyi, 0.6.2 may fix the issue. See #28 for more details.

pmcnano commented 2 years ago

Thanks @ankane, I honestly meant to submit something, but I have been really busy. Interestingly enough, the issues went away when I updated to the last version ON INTEL, however on my M1 they got worse. I was going to make a simple project for you, but I just didn't have the time.

I will test the new release on M1, if it works it will be GREAT.

Thanks so much for your time.