SciNim / flambeau

Nim bindings to libtorch
85 stars 3 forks source link

item(Complex64) does not compile #20

Closed Clonkk closed 3 years ago

Clonkk commented 3 years ago

When trying to obtain a Scalar value from a Tensor of dimension 0 using self.item(Complex64) fails during linkage with undefined reference to `tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA at::Tensor::item<tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA>() const'

Full ld error :

Hint:  [Link]
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /home/rcaillaud/.cache/nim/test_tensor_d/@mtest_tensor.nim.cpp.o: in function `main__5ERS50C2yKrnV9cScP0LGow()':
@mtest_tensor.nim.cpp:(.text+0xa967): undefined reference to `tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA at::Tensor::item<tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA>() const'
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: @mtest_tensor.nim.cpp:(.text+0xaa0c): undefined reference to `tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA at::Tensor::item<tyObject_Complex__zWadV1X9aMO7qS9bsQFB0JFA>() const'
collect2: error: ld returned 1 exit status
Error: execution of an external program failed: 'g++   -o /home/rcaillaud/Workspace/localws/ForkedProjects/flambeau/tests/build/test_tensor  /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_assertions.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_formatfloat.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_io.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_system.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_exitprocs.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_parseutils.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_math.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_unicode.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_strutils.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_streams.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_times.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_hashes.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_sets.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_sequtils.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_os.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_strformat.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_terminal.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_unittest.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/stdlib_complex.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/@m..@s..@s..@scppstl@scppstl@sstring.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/@m..@sflambeau@sraw_bindings@stensors.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/@m..@sflambeau@shigh_level@sinterop.nim.cpp.o /home/rcaillaud/.cache/nim/test_tensor_d/@mtest_tensor.nim.cpp.o  -lm -lrt -L/home/rcaillaud/Workspace/localws/ForkedProjects/flambeau/vendor/libtorch/lib -lc10 -ltorch_cpu -Wl,-rpath,/home/rcaillaud/Workspace/localws/ForkedProjects/flambeau/vendor/libtorch/lib   -ldl'
Clonkk commented 3 years ago

I think the problem comes from Complex[T: SomeFloat] in Nim does not translate to std::complex

HugoGranstrom commented 3 years ago

Yes we haven't wrapped std::complex yet

Clonkk commented 3 years ago

Is it std::complex or c10::complex ?

HugoGranstrom commented 3 years ago

It's probably c10::complex now that you mention it. But I think we can be sure that there are implicit conversion taking place on the C++ end as well 😉

I read up on it and they made this type because std::complex didn't work on Cuda basically

Clonkk commented 3 years ago

I made progress on wrapping std::complex on cppstl and I added converter betwen c10::complex and std::complex in Flambeau. I'm still getting a segfault when using .item on complex tensor.

HugoGranstrom commented 3 years ago

Are you getting the exact same error message as before? Don't know why, but I can't get my installation to compile anymore so I can't check for myself atm:/ Edit: I just forgot --cc:vcc 🤦

HugoGranstrom commented 3 years ago

I'm only getting this when runnning:

Traceback (most recent call last)
C:\Users\hugog\code\nim\flambeau\tests\test_tensor.nim(133) test_tensor
C:\Users\hugog\.choosenim\toolchains\nim-1.4.4\lib\pure\unittest.nim(529) main
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: 'C:\Users\hugog\code\nim\flambeau\tests\build\test_tensor.exe '

Is that the same as you?

Clonkk commented 3 years ago

With the c10::complex and std::complex wrap, I get a segmentation fault as well.

HugoGranstrom commented 3 years ago

Ok, I'm actually getting a segfault from the max function itself even without item:

let m = max(ifftout)
HugoGranstrom commented 3 years ago

Yes it seems to be the issue. If I replace it with:

let m = ifftout[0].item(Complex64)

it runs just fine. So I guess Libtorch doesn't support max(ComplexTensor), maybe because it doesn't make sense to have a bigger/smaller complex number?

Clonkk commented 3 years ago

Good find ! I was planning on investiagting that today to finish the fft part of the test_tensor.

Next steps would be adding the IndexDefect excpetion to indexing and add a way to check Tensor type for procs that uses 2 Tensor (see https://github.com/SciNim/flambeau/issues/19)

HugoGranstrom commented 3 years ago

Nice!

That sounds really tedious to check that for every proc. We would have to wrap a lot of procs like this:

func dot_internal(self: Tensor, other: Tensor): Tensor {.importcpp: "#.dot(@)".}

func dot*(self, other: Tensor): Tensor =
    checkTypes(self, other)
    result = dot_internal(self, other)

And it sounds hard to automate it with a macro as well if we have additional parameters beyond the two Tensors.

Doesn't Libtorch check this itself because it is a release build perhaps?

Clonkk commented 3 years ago

The checkType could be implemented simply by making Tensor generic.

type
  Tensor[T] {.importcpp: "Tensor".} = object

And then making all bindings generics

But if we do that, as @mratsim argued, we lose the "Python-esc" API and it will complicates the iterator part, so an alternative is :

type
  RawTensor {.importcpp: "Tensor".} = object
  Tensor[T] = distinct RawTensor

Which means having a non-generic API one and a generic one - either in Flambeau by reorganizing folders a bit and having a raw folder to keep the RawTensor or in another SciNim project to keep the higher level Tensor.

IMHO, I think having flambeau/ as exported generic API and flambeau/raw for non-exported by default (but you can still specifically import it) as non-generic C++ bound API better than a new project

HugoGranstrom commented 3 years ago

Agreed, that would make things a lot easier 👍

I'm ready to agree with you that that the second option is better. And I also think that it makes things much easier to handle if we keep them in the same repo (if only for now). The most important thing is that it makes tests and CI easier as we don't have to rely on Nimble to match the correct versions. This way we would guarantee that the version of the raw wrapper always matches the version of the high-level one as they come bundled with each other. I also think, as you mentioned before, that it creates less confusion if we don't spread it out over separate repos. This is the way Pytorch does it as well, having both Pytorch and Libtorch in the same repo.

Clonkk commented 3 years ago

There is now a test in test_tensor for this issue