Closed WangYuyao closed 2 months ago
Thank you for your contribution @WangYuyao The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?
Yes, all the test cases were run before submitting and they are all passed on my machine.
Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the daphne-dev
(X86-64) and the github-action
CI container)...
Thanks for pointing out that there are inconsistencies with the CUDA counterparts. Sure, it would be great if you could address them or give @WangYuyao some hints on how to fix those himself.
@WangYuyao told me that he would be happy to continue working on this contribution over the summer to increase its value for DAPHNE.
Hi @WangYuyao & @pdamme, I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :) For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).
Hi @WangYuyao & @pdamme, I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :) For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).
Dear @corepointer, the commit with ID fe41fef is the implementation of DNN CPU Ops which passed on both my and pdamme's machine. The latter is the attempt to implement a DNN example in DaphneDSL including adding the respective DaphneDSL built-in functions, registering the DNN CPU kernels, etc. However, I failed in this part. For example, the kernel function max_pooling returned a message "std::bad_array_new_length" and now I am trying to debug it.
Hi! According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger. hth, Mark
[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length
Hi! According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger. hth, Mark
[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length
Thank you for your reply and I will try to fix my code following your guidance!
Additionally, would it be convenient for you to communicate via email?
Thank you for your patience!
Hi! Yes sure. Please connect us @pdamme Best, Mark
Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the
daphne-dev
(X86-64) and thegithub-action
CI container)...
I must correct myself. I noticed that I had commented out the new unit test cases for the DNN-related kernels when I ran the tests... When I don't comment out the test cases, they fail. However, I think I have found the cause. It's not directly related to the kernels, but to a subtle issue with the test case code itself. Each of those test cases had a little helper function, and those were all called check
. When we give them unique names, e.g., checkBiasAddForward
, then all test cases pass on my machine in the daphne-dev
container. See my commit for details. I don't fully understand why that led to a problem, since the unit tests should be independent compilation units, but maybe there is a problem when they are used by catch2, I can only speculate...
However, now the CI tests fail with a different problem, already in the "Initialize Containers" step, i.e., unrelated to the DAPHNE source code:
failed to register layer: write /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: no space left on device
@corepointer: Do you have an idea what is going wrong there?
If the non-unique name check
is a problem, then we might want to change that for the CUDA-based DNN kernel test cases, too.
Sorry, just hit the wrong button. This PR is still open.
Dear @corepointer, I have questions about the form of the kernel ops definition in DAPHNE.
I found 2 ways of defining kernels. One is using 'namespace' and the other is 'Struct for partial template specialization - Convenience function - (Partial) template specializations for different data/value types'. Is there any difference between them?
Additionally, in the way of using 'Namespace', I found there is a namespace function named 'Forward'. Does it mean that originally there should also be a function named 'Backward' to implement the backward ops of DNN, and correspondingly in DaphneDSL DNN functions will be like 'Conv2d.forward' and 'Conv2d.backward'?
Similarly, to integrate both GPU and CPU versions of DNN ops, what should the form of functions be like in DaphneDSL? Will there be an argument like "use_cuda == 1" or a namespace function like 'conv2d.cuda' or 'conv2d' to distinguish which version of functions to use?
Thank you for your guidance!
Dear @WangYuyao, The name spaces in the C++ kernel implementations is just an organizational measure. And I mostly omitted the convenience function as most code that calls the kernels is generated anyway. Also, Pooling is a special case where I put the opcode into the template parameters. This would normally be a function parameter in the convenience function that passes it on to the apply() method.
In the namespace of a kernel, e.g., CUDA::NN::Softmax
there should be the structs Forward
and Backward
that have an apply() method. This would be the place where forward and backward passes for the Softmax layer can be implemented.
In the DSL I went for wrapper functions. But that's not merged to main yet and lives in the branch of PR #734. This is also how it's done in SystemDS and allows to encapsulate utility code like conv2d.init(...)
and you can also easily exchange a script based conv2d
against a kernel based call. So conv2d.forward(...)
would call conv2d(..)
appropriately. The backward functions I'd name the same way as in SystemDS (e.g., conv2d_backward_filter
and conv2d_backward_data
). With a wrapper script you can then call them both with conv2d.backward(...)
. Haven't implemented that yet in #734 though.
The decision if the CPU or GPU version is called will primarily be made by the DAPHNE compiler. At the moment the compiler pass to mark operations as GPU ops is quite primitive. But if an operation has the CUDASupport trait and the input is adequately sized, then the comiler would generate a GPU instruction instead of a CPU instruction. But only if you compiled with CUDA support and run with the --cuda
CLI parameter or set use_cuda=true
in the UserConfig.json file. There will be a mechanism to use special decorations in the DSL to pass extra information to the compiler. But that's a feature that is not implemented yet.
hth, Mark
Dear @corepointer , I have just uploaded my attempt to implement the CUDA version of batch_norm_backward op and the corresponding test file. However, when I tried to test the code by entering the command " ./build.sh --no-deps --target run_tests && ./test.sh -nb batch_norm_bwd* -d yes --cuda", the CUDA version was not executed. Could you please check my test file and provide me with some guidance on it?
Thank you for your time!
Hi,
Sorry for the delayed answer.
Two things caught my eye:
build.sh --clean -y
before switching from no CUDA to with CUDA support.hth, Mark
Dear @corepointer,
the implementation of BatchNormBackward GPU op and the corresponding test case is finished. I found that there is no implementation of that in the main repository and it is my pleasure that mine could be a part of #734 if it works correctly.
Besides, I noticed that there are conflicts in 'kernels.json', could you please give me some guidance on how to fix it?
Thank you for your help!
As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week. Regards, Mark
As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week. Regards, Mark
Gerne
Thank you for your effort @WangYuyao :+1:
After rearranging some thing and some cleanup I merged your contribution to the main branch. I did, however, not force push to your fork's main branch. This is why this PR is now marked as closed and not merged.
AMLS Exercise "DNN fallback CPU ops #226"DNN fallback CPU ops #226" Yuyao Wang