gleam-lang / otp

📫 Fault tolerant multicore programs with actors
https://hexdocs.pm/gleam_otp/
Apache License 2.0
443 stars 49 forks source link

Actor's `init` taking longer than `init_timeout` causes entire program to crash #73

Open brettkolodny opened 2 months ago

brettkolodny commented 2 months ago

Like the title describes, I've run into an issue where if the inittakes too long instead of returning an error then actor.start_spec will crash the entire program with the following error:

Runtime terminating during boot (killed)

Crash dump is being written to: erl_crash.dump...done

I'm using Erlang/OTP 27 and have tried on Erlang/OTP 25 with the same result.

Below is a code snippet that demonstrates the issue.

import gleam/erlang/process
import gleam/option
import gleam/otp/actor

pub type State {
  State(Int)
}

pub type Message {
  Increment
}

pub fn main() {
  start_actor()

  process.sleep_forever()
}

pub fn start_actor() {
  let spec: actor.Spec(State, Message) =
    actor.Spec(init:, init_timeout: 10, loop:)

  actor.start_spec(spec)
}

pub fn init() {
  process.sleep(1_000_000)
  actor.Ready(State(0), process.new_selector())
}

pub fn loop(_message, state) {
  actor.Continue(state, option.None)
}
lpil commented 2 months ago

Hello! the program is crashing as the linked child process is timing out and it is not supervised, so the exit continues to propagates until it reaches a process trapping exits (there is not one in this case) or until it reaches the root process, crashing the program.

Do you have some particular issue?

brettkolodny commented 2 months ago

If this is the expected behavior then maybe I'm misunderstanding the documentation, or something specific about my implementation. It says here:

Start an actor from a given specification. If the actor’s init function returns an error or does not return within init_timeout then an error is returned.

Based on the documentation my assumption was that the code above wouldn't crash but instead would instead return Error(StatError). Maybe the language "returns an error" is a bit overloaded here and it would be clearer to re-word it?

lpil commented 2 months ago

Ah yes, a bug for sure. Thank you.

I presume we want it to not crash but I've not the time to sit down and think through the implications of that and how we would implement it.

I'm also thinking back to something @crowdhailer said years back, that perhaps init should not fail and starting an actor shouldn't return a result. Could be a good thing to think about now as we're looking at making some breaking changes to support naming.

CrowdHailer commented 2 months ago

My opinion has not changed. But I don't have any new compelling arguments.

lpil commented 2 months ago

What were your old arguments? If you can remember the details 😄

CrowdHailer commented 2 months ago

I think that you should always be able to validate the initial state before calling an init. And you shouldn't rely on anther pid/service without you actor understanding failure in reaching that service. I think Async init was a problem. I dunno, I'd have to play again to make better arguments