SciRuby / nmatrix

Dense and sparse linear algebra library for Ruby via SciRuby
Other
469 stars 133 forks source link

Specific input may cause `rake pry` to crash #452

Closed zhuanhao-wu closed 6 years ago

zhuanhao-wu commented 8 years ago

Hi there, I was testing how dtype performed a while ago and I found that on my computer, the input below may cause the pry to crash. (You may paste these input directly twice into pry: for the first time, it executes without error, but for the second time, pry will crash)

x = NMatrix.new [3, 3], [[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype: :object
y = NMatrix.new [3, 3], [[1, 2, 3], [4, 1, 0], [5, 5, 2]], dtype: :object
x.dot y # This line may raise an error
x = NMatrix.new [3, 3], [[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype: :object
y = NMatrix.new [3, 3], [[1, 2, 3], [4, 1, 0], [5, 5, 2]], dtype: :object
y = NMatrix.new [3, 3], [[1, 2, 3], [4, 1, 0], [5, 5, 2]]
y = NMatrix.new([3, 3],[[1, 2, 3], [4, 1, 0], [5, 5, 2]])

y = NMatrix.new([3, 3],[[1, 2, 3], [4, 1, 0], [5, 5, 2]], dtype: :float64)
y = NMatrix.new([3, 3],[1, 2, 3, 4, 1, 0,  5, 5, 2], dtype: :float64)
x = NMatrix.new [3, 3], [1, 0, 0, 0, 1, 0, 0, 0, 1], dtype: :object
x

After some more try, I found that, if you input the following code into pry:

x = NMatrix.new [3, 3], [[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype: :object
y = NMatrix.new([3, 3],[[1, 2, 3], [4, 1, 0], [5, 5, 2]], dtype: :float64) # This line may raise an error
y = NMatrix.new([3, 3],[1, 2, 3, 4, 1, 0,  5, 5, 2], dtype: :float64)
x = NMatrix.new [3, 3], [1, 0, 0, 0, 1, 0, 0, 0, 1], dtype: :object
x

It executes without errors. But if you then quit pry, pry will crash. (pry crashes with similar behavior as #449 , maybe a segfault happened..)

Not sure this the problem with nmatrix or pry. Hopefully you can reproduce the errors on your machine.

wlevine commented 8 years ago

I can reproduce the second one, but not the first one.

It seems like this line alone:

y = NMatrix.new([3, 3],[[1, 2, 3], [4, 1, 0], [5, 5, 2]], dtype: :float64)

is enough to cause a segfault on quit.

wlevine commented 8 years ago

So, when we run the line above, it gets up to here: https://github.com/SciRuby/nmatrix/blob/80765a529b78e3da8e918e9e1e8938d52c9f9e20/ext/nmatrix/ruby_nmatrix.c#L2801 at which point a Ruby exception is raised. This causes it to jump back up to the nearest catch block, which in this case is all the way up in pry. So it skips all the remainder of the C code, including the cleanup bits with all the nmunregister* stuff. I suspect, but I haven't confirmed, that this is what causes the crash on quit. The problem is that it's pretty non-trivial to catch a Ruby exception in C code. The other option is that we manually check all the types in the array before trying to convert to floats (or whatever). This has the downside of introducing an extra check that just duplicates the check that would happen anyway (the one that raises the exception in the first place).

translunar commented 8 years ago

Could it be because the array being passed in is not flat? On Tue, Feb 2, 2016 at 4:19 PM Will Levine notifications@github.com wrote:

So, when we run the line above, it gets up to here: https://github.com/SciRuby/nmatrix/blob/80765a529b78e3da8e918e9e1e8938d52c9f9e20/ext/nmatrix/ruby_nmatrix.c#L2801 at which point a Ruby exception is raised. This causes it to jump back up to the nearest catch block, which in this case is all the way up in pry. So it skips all the remainder of the C code, including the cleanup bits with all the nmunregister* stuff. I suspect, but I haven't confirmed, that this is what causes the crash on quit. The problem is that it's pretty non-trivial to catch a Ruby exception in C code. The other option is that we manually check all the types in the array before trying to convert to floats (or whatever). This has the downside of introducing an extra check that just duplicates the check that would happen anyway (the one that raises the exception in the first place).

— Reply to this email directly or view it on GitHub https://github.com/SciRuby/nmatrix/issues/452#issuecomment-178858558.

wlevine commented 8 years ago

Well, that's why the exception gets raised in the first place. I suppose we could support this usecase, but the larger issue is how to clean up properly when an Ruby exception gets raised deep in C code. I suspect that the same issue could also turn up elsewhere in the codebase.

translunar commented 8 years ago

Oy. Yeah, when we designed it, we didn't have the registration and unregistration. I suppose we just have to check argument sanity as soon as stuff comes in, as much as possible.

wlevine commented 8 years ago

Just for the record, I think my diagnosis in the last couple of comments is not entirely correct. For one thing, NM_CONSERVATIVE(nm_register_value(&arg)); should have no effect in the standard build, since this macro resolves to nothing unless NM_GC_CONSERVATIVE is defined. However, I still think that the problem is due to the Ruby exception causing a jump across C code that does some necessary cleanup.

translunar commented 6 years ago

Closed because I'm not sure it still applies. Someone can open a new issue if they like.