elixir-circuits / circuits_gpio

Use GPIOs from Elixir
129 stars 23 forks source link

[Question] v1.2.0, it may happen reload error often? #178

Closed pojiro closed 9 months ago

pojiro commented 9 months ago

v1.2.0 is not released on Github, but it is on hex.pm, https://hex.pm/packages/circuits_gpio/versions.

I confirmed the v1.2.0 has a issue on open, like

 {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}

If not released officially, how about retired it ?

fhunleth commented 9 months ago

It's here: https://github.com/elixir-circuits/circuits_gpio/releases/tag/v1.2.0.

It had been used in production, so I'm not sure why you're getting that error.

pojiro commented 9 months ago

It's here: https://github.com/elixir-circuits/circuits_gpio/releases/tag/v1.2.0.

Thanks, ok. i mistake to read the releases on Github.

It had been used in production, so I'm not sure why you're getting that error.

I think It's caused by this, https://github.com/elixir-circuits/circuits_gpio/pull/142/files#diff-596f8a2c84f5216978df24afaedf32021989ed575e50d201755750080f0d3903

Defer loading, if open is called twice or over, load_nif is also called. So calling load_nif over one time may happen error always. But open gpio each pin, i mean open other pin several time is usuall usage.

fhunleth commented 9 months ago

The unit tests open pins lots of times. Do you have an example that I could try to reproduce the issue? Circuits.I2C, Circuits.SPI and Circuits.GPIO v2 all use the same pattern for loading the NIF.

fhunleth commented 9 months ago

Also, in case you're wondering, I deleted the prereleases from GitHub releases. They did make it hard to see v1.2.0 and I wouldn't want anyone using prerelease versions now anyway.

pojiro commented 9 months ago

The unit tests open pins lots of times. Do you have an example that I could try to reproduce the issue?

Yes, I will make simple reproduce case.

pojiro commented 9 months ago

@fhunleth I made reproducible repo, https://github.com/pojiro/test_gpio .

This repo's firmware for rpi4 can reproduce this issue.

05:23:00.051 [info] {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}, %{ref: nil, pin: 26}

05:23:00.052 [info] {:ok, #Reference<0.1163197885.843710481.257840>}, %{ref: nil, pin: 5}

05:23:00.052 [error] |=====> Stopping Elixir.TestGpio.GpioServer GenServer, reason: {{:badmatch, {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}}, [{TestGpio.GpioServer, :handle_info, 2, [file: ~c"lib/gpio_server.ex", line: 25]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1077]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1165]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}]}

05:23:00.052 [info] {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}, %{ref: nil, pin: 19}

05:23:00.052 [error] |=====> Stopping Elixir.TestGpio.GpioServer GenServer, reason: {{:badmatch, {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}}, [{TestGpio.GpioServer, :handle_info, 2, [file: ~c"lib/gpio_server.ex", line: 25]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1077]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1165]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}]}

05:23:00.053 [info] {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}, %{ref: nil, pin: 13}

05:23:00.053 [error] |=====> Stopping Elixir.TestGpio.GpioServer GenServer, reason: {{:badmatch, {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}}, [{TestGpio.GpioServer, :handle_info, 2, [file: ~c"lib/gpio_server.ex", line: 25]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1077]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1165]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}]}

05:23:00.053 [info] {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}, %{ref: nil, pin: 6}

05:23:00.053 [error] |=====> Stopping Elixir.TestGpio.GpioServer GenServer, reason: {{:badmatch, {:error, {:reload, ~c"NIF library already loaded (reload disallowed since OTP 20)."}}}, [{TestGpio.GpioServer, :handle_info, 2, [file: ~c"lib/gpio_server.ex", line: 25]}, {:gen_server, :try_handle_info, 3, [file: ~c"gen_server.erl", line: 1077]}, {:gen_server, :handle_msg, 6, [file: ~c"gen_server.erl", line: 1165]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 241]}]}
fhunleth commented 9 months ago

Oh! I understand. That's a race condition, and it's really obvious now that I see your test. I'm able to reproduce. I'll have a fix up soon.

fhunleth commented 9 months ago

While I fixed the NIF load race condition, the unit tests uncovered race conditions in the NIF. Those are the CI failures in the PR. Luckily, this is the stub implementation which has shared data that's subject to races and the real implementation shouldn't. However, I'm going to take time to review all of the code to fix these races and check for others now that I know what's happening and can easily reproduce. I suspect that the PRs will fix the issues for you. #180 fixes v1.2.0 and #179 fixes v2.0.0.

fhunleth commented 9 months ago

@pojiro Thank you so much for bringing this to my attention!

pojiro commented 9 months ago

@fhunleth

Thanks for your explanation. Now i can understand what happens in race condition, following code was tricky. It changes its behavior between before load and after load. This issue happens when calling open same time before load, It was interesting.

  def open(pin_number, pin_direction, initial_value, pull_mode) do
    with :ok <- load_nif() do
      apply(__MODULE__, :open, [pin_number, pin_direction, initial_value, pull_mode])
    end
  end

And I confirmed maint-v1.x-fix-nif-race resolves this issue, thank you so match!