Closed aabbas90 closed 3 years ago
Hey!
I am able to reproduce the error and will take a look at the problem. Just wanted to give you a heads up!
Should be fixed now. There was a typo in one of my kernels that caused the incorrect behavior. Please let me know if the problem is gone now. See https://github.com/GPUPeople/spECK/commit/367560d8efc93823474d4aadb3f21ba6c9b7bcfb
@dabeschte Thanks for the quick fix! My toy example works fine now.
There is a second issue though: In some instances of my real use-case I am encountering an "illegal memory access error". I am attaching one such example instance. I have checked calling cuSparse which gives correct result.
However, cuSparse does not work for very large matrices and the alternate is too slow (see: https://forums.developer.nvidia.com/t/cusparse-cusparsescsrgemm2-much-slower-than-spgemm/181620/2). Thanks for your help!
I can't reproduce this error. What hardware are you using, which GPU? Which CUDA version? Windows/Linux? The output for these matrices is:
trying to load csr file "D:/D Downloads/matrices/A_big.hicsr"
successfully loaded: "D:/D Downloads/matrices/A_big.hicsr"
trying to load csr file "D:/D Downloads/matrices/B_big.hicsr"
successfully loaded: "D:/D Downloads/matrices/B_big.hicsr"
Matrix: 333724x333724: 2220633 nonzeros
var-SpGEMM -> NNZ: 2216832
var-SpGEMM SpGEMM: 22.4699 ms
I was using cuda_11
branch but master
branch runs fine (with appropriate CUDA versions). I am using Linux. For GPUs I tried Tesla V100 and Tesla P40 and had the same results.
Hi @dabeschte . Any leads to finding the reason of the crash for cuda 11? Thanks!
Sorry for taking so long. I am running cuda 11 too, using the 3090 on Windows 10. So I think it is very unlikely that cuda 11 is the problem. I still did not have the time to run it on Linux, but I will try to do it today. Unfortunately, I don't have the same hardware available, but I would be surprised if that is the problem.
Do you know where the exception is thrown? (file/line number?)
The exception was thrown at:
The values inside the input arguments seem fine. I think the error occurs somewhere before this call already and is caught at this location.
Unfortunately, I was not able to reproduce the error in Ubuntu 20.04. with cuda 11.1 and the RTX 3090.
That makes debugging a little more difficult... The location of the exception could mean that the error happens in cub, but I think that is very unlikely. The exception could likely mean that spECK is accessing already freed memory. The dCSR class automatically frees all arrays in the destructor. Could it be that the destructor is called before for a copy of the object? Try to comment this line https://github.com/GPUPeople/spECK/blob/367560d8efc93823474d4aadb3f21ba6c9b7bcfb/source/dCSR.cpp#L40 to be sure.
I am sorry for the trouble. I am just really running out of ideas why this bug could happen.
Hi @dabeschte , Thanks for investigating it further. I have drilled it down further:
cuda 11.2.2 works for me too.
One problem could be that you are using more than 49152 "dynamic" memory. Or are you using the same "dynamic" as "static" memory for pascal? ...since that architecture also does not support larger dynamic sizes. https://github.com/GPUPeople/spECK/blob/367560d8efc93823474d4aadb3f21ba6c9b7bcfb/include/Multiply.h#L11
For Pascal I use the default values i.e. 49152. I increased the value for dynamic memory when I tested on Volta.
that sounds OK. I will try to compile it for Pascal, but that might take a few days. Do you need it to work on both P40 and V100 with cuda 11.2?
I am trying to use spECK in my research code which will possibly be made open-source. So I would be glad if we can find and eliminate configurations which are not supported by spECK for example not supporting CC61 is fine by me (in that case one can revert to cuSparse). So we find largest possible set of configurations which are guaranteed to work by spECK and update the getting started section: (https://github.com/GPUPeople/spECK/blob/master/readme.md#getting-started). What do you think?
Yeah I agree that it would be better to just fix the bugs. They should be minor anyways, since the code was originally developed on a cc61 GPU, but then optimized for a Titan V (cc70).
I found the Pascal bug which actually has nothing to do with the GPU generation. It was a bug present in all versions, but only triggered on pascal (and older) due to smaller shared memory. I hope this is the last bug now. If you find something else, please tell me!
Found another failure case on two 'huge' matrices with cc70 and Tesla V100. Downloadable from: https://drive.google.com/file/d/12rqw4sSSjvhryC69-DW8bZsbPvl4LBDG/view?usp=sharing
Nice, thanks for pointing me to all bugs, this is a great contribution to the project! This bug is also fixed now. Let me know if you find more bugs
@aabbas90 are the bugs fixed now? Can I close the issue?
Yes. Thanks for your support.
Hi,
I am trying to multiply two very small (but different) matrices
A: 4 x 7, B: 7 x 4
the dimension and fill pattern of the resulting matrix is correct but the values are not. For example showing one such comparison:Row: 2 | Values do NOT match: (Ref|Comp) : (-3.000000|2.000000) - pos: 0/3 - col 0
I modified the
DataLoader.cpp
as follows to read the two matrices as:I am attaching the matrix files. matrices.zip
Is there something I am doing wrong? Thanks!