fffonion / lua-resty-openssl

FFI-based OpenSSL binding for OpenResty
BSD 2-Clause "Simplified" License
130 stars 43 forks source link

bugfix: fixed multiple use-after-free. #172

Closed zhuizhuhaomeng closed 1 month ago

zhuizhuhaomeng commented 1 month ago

Any progress on this PR? Should we split this PR into multiple PRs of different topics?

fffonion commented 1 month ago

@zhuizhuhaomeng For now I'm only convinced that https://github.com/fffonion/lua-resty-openssl/pull/172/files#diff-04e6b16ea7e3d0a48ecfe02dc0de05c2606b5a3e0c1bb67b0d0530b329e1e52eR143 and https://github.com/fffonion/lua-resty-openssl/pull/172/files#diff-173dab9a5b7e2733fdc5d8b8f13c8b6d331b5c5f95a29b22b9f4c43d8e1bbd9dR60 this two needs a fix. But I'm not sure about the other ones. Let's open a seperate PR only contains this string length fix.

fffonion commented 1 month ago

As I'm slightly confused about the gc-on-return test method and the corresponding fix. When you changethe tail return with a function call then return, it essentially no-op'ed the debug hook you added, while in most places I'm worried about that we are just hiding the root cause. Because I also reproduce the test locally and the error doesn't seem to simply caused by the tail return.

zhuizhuhaomeng commented 1 month ago

The tail call will reset the stack of the current function, so there are no references to the variable. That's why the variable will be gc collect before entering the tail call function.

fffonion commented 1 month ago

I understand the initiative you are trying to fix, but some fix doesn't match my understanding. For example, for this:

local res, err = lib.dup(got)
  if res ~= nil then
    res._dupped_from = got_ref -- keep got_ref from gc
  end

  return res, err

does it work if we just do:

  local res, err = lib.dup(got)
  return res, err

And for some places (for example the x509.extension change), I don't understand how this fix would change the behaviour, thus I'm worried we are just hiding something else by silencing the current check.

I'm okay to move forward if we can solve the open questions in this PR, otherwise I don't feel comfortable to merge them without sound reasoning. I do agree the tail call GC check does catch some corner cases that needs fix.

(On the other side, just for an education for me, is adding the tail call check required because this is enabled in production or would be seen in real world? If so I'm happy to also add this to CI.)

zhuizhuhaomeng commented 1 month ago

I understand the initiative you are trying to fix, but some fix doesn't match my understanding. For example, for this:

local res, err = lib.dup(got)
  if res ~= nil then
    res._dupped_from = got_ref -- keep got_ref from gc
  end

  return res, err

does it work if we just do:

  local res, err = lib.dup(got)
  return res, err

And for some places (for example the x509.extension change), I don't understand how this fix would change the behaviour, thus I'm worried we are just hiding something else by silencing the current check.

I'm okay to move forward if we can solve the open questions in this PR, otherwise I don't feel comfortable to merge them without sound reasoning. I do agree the tail call GC check does catch some corner cases that needs fix.

(On the other side, just for an education for me, is adding the tail call check required because this is enabled in production or would be seen in real world? If so I'm happy to also add this to CI.)

I don't know if this works or not. We need to reproduce this issue and see what was the problem. I will split this PR into multiple small PRs. That would be easy to discuss.

fffonion commented 1 month ago

I have fixed most of the errors in https://github.com/fffonion/lua-resty-openssl/pull/178. It might not be a good idea to run the valgrind+gc hook in CI as it took 30m to finish, do you know if test::nginx supports asan instead of valgrind?