[PyTorch] Update to 2.2.1 and minor changes

HGuillemet commented 10 months ago

Work in Progress

Included in this PR:

Update to PyTorch 2.2.1
Restore ExampleStack and TensorExampleStack constructors
Generate more overloads for methods taking an array ref
Add parsing of CUDAFunctions.h (wrappers around commonly used CUDA API functions)
~~Add AOTInductor (new way to run models exported from Python)~~
Virtualize FunctionPreHook and FunctionPostHook to enable setting hooks on autograd graph
Passing a vector of tensors, and some other classes, by value does not remove the data anymore
Add Module.asXXX to test if a Module is of a specific subclass (and do the cast).
Add new AMD Float8 types Float8_e5m2fnuz and Float8_e4m3fnuz with unsigned zero
Add preload of nvrtc-builtins (fixes #1468)

sbrunk commented 10 months ago

@HGuillemet thanks for the update and especially adding the AOTInductor mapping. I think that's an interesting new variant to be able to use the new optimizations at least for inference. Training still needs to happen in Python in that case but we can export it to a C++ lib and then use that from Java with the bindings.

Have you been able to try it already?I'll give it a try myself, just curious if you got something working already.

HGuillemet commented 10 months ago

No I haven't, yet. I'm training from Java, so there is little chance I use it in the near future. I must rely on Torchscript when I need to import python models, and not a lot of people bother to write Torchscript compatible models :(

HGuillemet commented 9 months ago

I don't understand the linking error on Windows:

jnitorch_cuda.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl torch::inductor::AOTIModelContainerRunnerCuda::~AOTIModelContainerRunnerCuda(void)" (__imp_??1AOTIModelContainerRunnerCuda@inductor@torch@@QEAA@XZ)
jnitorch_cuda.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl torch::inductor::AOTIModelContainerRunnerCuda::AOTIModelContainerRunnerCuda(char const *,unsigned __int64,char const *)" (__imp_??0AOTIModelContainerRunnerCuda@inductor@torch@@QEAA@PEBD_K0@Z)

AOTIModelContainerRunnerCuda constructor is defined in aoti_model_container_runner_cuda.h which is included from jnitorch_cuda.cpp and no destructor is declared anywhere for this class. Any idea ? @saudet, could this be related to ccache and, if yes, could you clear the cache ?

saudet commented 9 months ago

There's probably some template somewhere that requires them. You'll probably get the same error on Linux and Mac if you try to link with -Wl,--no-undefined, so try to fix the errors you get with that, and it should fix those errors on Windows too.

HGuillemet commented 9 months ago

Thanks for the suggestion. Adding the linker option raised an error about cudnn not linked to jnitorch_cuda. Let's see but I doubt it's related to the error on windows.

HGuillemet commented 9 months ago

It seems it has been spotted and fixed upstream in https://github.com/pytorch/pytorch/commit/79ba39710e89bf9f2f12b199aae87afc402a3176 after 2.2.0 release. I guess we'd better postpone the inclusion of the AOTInductor feature to next release.

HGuillemet commented 9 months ago

To enable the setting of hooks in autograd graphs, I need to virtualize FunctionPreHook and FunctionPostHook, which have a virtual method taking a ref to a vector of tensors and returning a vector of tensors. Compilation passes only if I remove the valueTypes in this info:

new Info("std::vector<torch::Tensor>", "std::vector<at::Tensor>", "std::vector<torch::autograd::Variable>", "torch::autograd::variable_list")
  .valueTypes("@Cast({\"\", \"std::vector<torch::Tensor>\"}) @StdMove TensorVector")
  .pointerTypes("TensorVector").define())

I wonder why this valueTypes is here. It will save a copy when we pass a vector of tensors to a native functions but OTOH it will destroy the vector, while the user could need it after the function call. If I understand well, if a native function takes a rvalue ref (&&) , parser will generate @ByRef(true) which is enough to avoid copies. @saudet could you share your infinite knowledge about this point ? Could it break something if I remove the valueTypes ? First attempts seem to show it does not.

There are some other types with this kind of @Cast @StdMove value types (DataPtr, Storage, TensorMaybeOwned, TensorBaseMaybeOwned, TensorName, EdgeVector)

saudet commented 9 months ago

If you're not getting any compile errors, then I guess PyTorch's API was improved so that we don't need them anymore, yes

HGuillemet commented 9 months ago

@sbrunk could you run your tests on this PR ? Anything you'd like to be added ?

sbrunk commented 9 months ago

@sbrunk could you run your tests on this PR ? Anything you'd like to be added ?

Tests are looking good!

saudet commented 9 months ago

If you're not planning on making more changes for now, we can merge this?

HGuillemet commented 9 months ago

Ok for me.

saudet commented 8 months ago

2024-03-04T11:44:04.1783504Z Caused by: java.io.IOException: No space left on device

Could you try to fix this? We probably just need to uninstall a couple of large unnecessary packages...

HGuillemet commented 8 months ago

I had already added a bunch of rm in deploy-ubuntu once, on downloaded archives after there installation, and you reverted that. I can try to add them again, that seems the easiest and fastest way to make room. In a new PR ?

saudet commented 8 months ago

Really? Could you point me to that revert and I'll try it on the actions branch here

HGuillemet commented 8 months ago

I'm seeing it was on deploy-centos, in fact: https://github.com/bytedeco/javacpp-presets/pull/1360/commits/3e3fe5c096b76b0e8cd9a116d1986f5858308391

I'm reviewing deploy-ubuntu and adding similar cleanup. Shall I push the commit here or on a new PR ?

saudet commented 8 months ago

Ah, that won't be enough. We'll probably need to remove a lot more stuff. You can try it here, but we won't know if it works until actual deploy, so I don't know. Let's check how many more GB we can with df -h I guess for now is good indication

HGuillemet commented 8 months ago

I pushed https://github.com/bytedeco/javacpp-presets/commit/aaa37a17afba0d06fc4803a48d5caa6d69eb2224 on my branch but it doesn't update this PR now that it's merged.

Anyway, this commit will indeed only save ~ 700Mb for pytorch build, due to mkl archive removal.

If it's not enough, what about a maven clean phase on main artifact after its deploy phase and before the deploy phase of the platform/ext artifact ? This should get rid of cppbuild directory (about 7 or 8G)

saudet commented 8 months ago

That could work, yes

bytedeco / javacpp-presets

[PyTorch] Update to 2.2.1 and minor changes #1466

Work in Progress