elixir-lang / gen_stage

Producer and consumer actors with back-pressure for Elixir
http://hexdocs.pm/gen_stage
1.51k stars 192 forks source link

`Endpoint.config_change/2` causes `undefined handle_info` #278

Closed sb8244 closed 2 years ago

sb8244 commented 2 years ago

I'm hoping that this is the right repo to report in. This is a very strange issue for me.

I have a fairly vanilla application that I started using Phoenix + GenStage. My GenStage has handle_info defined to take in some push-based buffer. It's getting this handle_info call from Phoenix.PubSub.

This was all working well with recompile. I added in LiveView and started building out my UI. Now, after I change the view and the PubSub is invoked, it causes

00:31:04.871 [warn] ** Undefined handle_info in MyApp.AgentQueue.Producer
** Unhandled message: %MyApp.Schema.PendingLookup{

I hunted down that if I comment out MyAppWeb.Endpoint.config_change(changed, removed) in application, then it doesn't get into this state. It does not occur if I use recompile on its own.

edit: this worked for quite a few refreshes, but eventually it started happening again. I decided to check function_exported?, which is what GenStage uses:

iex(4)> function_exported?(AgentQueue.Producer, :handle_info, 2)
false
iex(5)> AgentQueue.Producer.__info__(:functions)                
[
  child_spec: 1,
  enqueue: 2,
  handle_call: 3,
  handle_demand: 2,
  handle_info: 2,
  init: 1,
  name: 1,
  prioritize: 2,
  start_link: 1
]
iex(6)> function_exported?(AgentQueue.Producer, :handle_info, 2)
true

I can't explain this behavior. There was no recompilation going on between 4-6.

josevalim commented 2 years ago

What happens is that recompile removes all code from your app, so when you receive a message, the code won’t be loaded and function_exported? won’t load it either.

this will happen with a GenServer too. There isn’t a fix really, it is a consequence of recompile. The same would happen if you change the GenServer state, the GenServer currently running would have the old state and likely error.

what is the result of :code.is_loaded for the module just after the recompile call?

sb8244 commented 2 years ago

Ah, it is false, as you'd guess. It's been over minutes since the recompile and the code is still not loaded.

I tried recreating it by changing the same files to see what would happen:

Is this expected? I'm pretty surprised by it. I never changed the module that's error'ing btw, and it isn't one of the files that's being recompiled.

josevalim commented 2 years ago

The code reloader unloads all modules. If said module is still loaded during recompilation, it is either because something during compilation used it, or something triggered that particular to be recompiled due to a dependency. As I said, I am not sure if there is a straightforward fix, things like changing the server state is also going to lead to crashes.

sb8244 commented 2 years ago

Interesting, TIL! I'm honestly not sure how this has never happened, but I'll work around it. edit: I think that this happened because there's not a compilation dependency between the producer and the invoker. This is not a pattern I use very often.

I expect that state changes in a GenServer would be non-recoverable and need restarted. It's surprising that I make a text change in the UI and the same thing happens. However, I think that I can create a compile dependency to guarantee that this one file stays loaded, as it's handling async messages.

Thanks for the confirmation that it's not an issue and explaining the root cause to me. Also, I see much better now after using :code.is_loaded and :code.all_loaded with a filter. I'll close this out.

josevalim commented 2 years ago

@sb8244 I had an idea you could try out if you want. :D

In Phoenix code reloader, we call purge_modules. If the module is purged, we could store it and automatically load it again after compilation. This has a downside though, as you use the app, you will accumulate the number of loaded modules, which are now loaded upfront, and it will make recompilation slower. But maybe we can also gate this behind a flag. Do you want to give it a shot?

In any case, good to close it here, it is def. a Phoenix behaviour.

josevalim commented 2 years ago

Oh, here is the code: https://github.com/phoenixframework/phoenix/blob/master/lib/phoenix/code_reloader/server.ex#L269-L278