Closed HGuillemet closed 9 months ago
@HGuillemet thanks for the update and especially adding the AOTInductor mapping. I think that's an interesting new variant to be able to use the new optimizations at least for inference. Training still needs to happen in Python in that case but we can export it to a C++ lib and then use that from Java with the bindings.
Have you been able to try it already?I'll give it a try myself, just curious if you got something working already.
No I haven't, yet. I'm training from Java, so there is little chance I use it in the near future. I must rely on Torchscript when I need to import python models, and not a lot of people bother to write Torchscript compatible models :(
I don't understand the linking error on Windows:
jnitorch_cuda.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl torch::inductor::AOTIModelContainerRunnerCuda::~AOTIModelContainerRunnerCuda(void)" (__imp_??1AOTIModelContainerRunnerCuda@inductor@torch@@QEAA@XZ)
jnitorch_cuda.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __cdecl torch::inductor::AOTIModelContainerRunnerCuda::AOTIModelContainerRunnerCuda(char const *,unsigned __int64,char const *)" (__imp_??0AOTIModelContainerRunnerCuda@inductor@torch@@QEAA@PEBD_K0@Z)
AOTIModelContainerRunnerCuda
constructor is defined in aoti_model_container_runner_cuda.h
which is included from jnitorch_cuda.cpp
and no destructor is declared anywhere for this class.
Any idea ?
@saudet, could this be related to ccache and, if yes, could you clear the cache ?
There's probably some template somewhere that requires them. You'll probably get the same error on Linux and Mac if you try to link with -Wl,--no-undefined
, so try to fix the errors you get with that, and it should fix those errors on Windows too.
Thanks for the suggestion. Adding the linker option raised an error about cudnn not linked to jnitorch_cuda. Let's see but I doubt it's related to the error on windows.
It seems it has been spotted and fixed upstream in https://github.com/pytorch/pytorch/commit/79ba39710e89bf9f2f12b199aae87afc402a3176 after 2.2.0 release. I guess we'd better postpone the inclusion of the AOTInductor feature to next release.
To enable the setting of hooks in autograd graphs, I need to virtualize FunctionPreHook
and FunctionPostHook
, which have a virtual method taking a ref to a vector of tensors and returning a vector of tensors. Compilation passes only if I remove the valueTypes
in this info:
new Info("std::vector<torch::Tensor>", "std::vector<at::Tensor>", "std::vector<torch::autograd::Variable>", "torch::autograd::variable_list")
.valueTypes("@Cast({\"\", \"std::vector<torch::Tensor>\"}) @StdMove TensorVector")
.pointerTypes("TensorVector").define())
I wonder why this valueTypes
is here.
It will save a copy when we pass a vector of tensors to a native functions but OTOH it will destroy the vector, while the user could need it after the function call.
If I understand well, if a native function takes a rvalue ref (&&
) , parser will generate @ByRef(true)
which is enough to avoid copies.
@saudet could you share your infinite knowledge about this point ?
Could it break something if I remove the valueTypes
? First attempts seem to show it does not.
There are some other types with this kind of @Cast @StdMove
value types (DataPtr
, Storage
, TensorMaybeOwned
, TensorBaseMaybeOwned
, TensorName
, EdgeVector
)
If you're not getting any compile errors, then I guess PyTorch's API was improved so that we don't need them anymore, yes
@sbrunk could you run your tests on this PR ? Anything you'd like to be added ?
@sbrunk could you run your tests on this PR ? Anything you'd like to be added ?
Tests are looking good!
If you're not planning on making more changes for now, we can merge this?
Ok for me.
2024-03-04T11:44:04.1783504Z Caused by: java.io.IOException: No space left on device
Could you try to fix this? We probably just need to uninstall a couple of large unnecessary packages...
I had already added a bunch of rm
in deploy-ubuntu once, on downloaded archives after there installation, and you reverted that.
I can try to add them again, that seems the easiest and fastest way to make room.
In a new PR ?
Really? Could you point me to that revert and I'll try it on the actions branch here
I'm seeing it was on deploy-centos, in fact: https://github.com/bytedeco/javacpp-presets/pull/1360/commits/3e3fe5c096b76b0e8cd9a116d1986f5858308391
I'm reviewing deploy-ubuntu and adding similar cleanup. Shall I push the commit here or on a new PR ?
Ah, that won't be enough. We'll probably need to remove a lot more stuff. You can try it here, but we won't know if it works until actual deploy, so I don't know. Let's check how many more GB we can with df -h
I guess for now is good indication
I pushed https://github.com/bytedeco/javacpp-presets/commit/aaa37a17afba0d06fc4803a48d5caa6d69eb2224 on my branch but it doesn't update this PR now that it's merged.
Anyway, this commit will indeed only save ~ 700Mb for pytorch build, due to mkl archive removal.
If it's not enough, what about a maven clean phase on main artifact after its deploy phase and before the deploy phase of the platform/ext artifact ? This should get rid of cppbuild directory (about 7 or 8G)
That could work, yes
Work in Progress
Included in this PR:
ExampleStack
andTensorExampleStack
constructorsCUDAFunctions.h
(wrappers around commonly used CUDA API functions)AddAOTInductor
(new way to run models exported from Python)FunctionPreHook
andFunctionPostHook
to enable setting hooks on autograd graphModule.asXXX
to test if aModule
is of a specific subclass (and do the cast).Float8_e5m2fnuz
andFloat8_e4m3fnuz
with unsigned zeronvrtc-builtins
(fixes #1468)