Open Skeleton003 opened 5 months ago
@frozenbugs @Rhett-Ying @peizhou001 @mfbalin @BarclayII I'd be grateful for your opinions and suggestions on this issue. Because this involves modifying the code written many years ago, I'm afraid I may not have thought it through enough.
I think it's a historical issue that only IdType
is used only that does not take more scenarios into consideration.
Dynamically determining the type of indptr
, indices
, data
is the correct way. please go ahead with the changes required.
Before the change, is it possible to add a check for data type to throw exception in the scenario we hit in the issue?
I think it's a historical issue that only
IdType
is used only that does not take more scenarios into consideration.Dynamically determining the type of
indptr
,indices
,data
is the correct way. please go ahead with the changes required.Before the change, is it possible to add a check for data type to throw exception in the scenario we hit in the issue?
Yes, please help review #7459 .
After further investigation, I've discovered a possible reason why we need IdType
. It's because the approaches of cuda implementation of COOToCSR
are different for int32
and int64
. See https://github.com/dmlc/dgl/blob/ed50c170dda9627730cb8ee4c7110205b6ea09de/src/array/cuda/coo2csr.cu#L25 and https://github.com/dmlc/dgl/blob/ed50c170dda9627730cb8ee4c7110205b6ea09de/src/array/cuda/coo2csr.cu#L100 .
If there is no way to merge these 2 approaches into an Integral one, we have to keep COOToCSR
a template function and keep IdType
parameter. But we still need to Dynamically determine the dtypes of ret_indptr, ret_indices and ret_data.
π¨Work Item
IMPORTANT:
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Background πΊοΈ
We now have 4 cpp functions responsible for converting COO to CSR, they are
SortedCOOToCSR
,UnSortedSmallCOOToCSR
,UnSortedSparseCOOToCSR
andUnSortedDenseCOOToCSR
. The selection of the appropriateCOOToCSR
function among them is based on a heuristic approach (https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L749).Bug π
Currently, all 4
COOToCSR
functions are defined with the templatetemplate <class IdType>
. Let us takeUnSortedSparseCOOToCSR
as an example. https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L413-L414In the task of converting COO to CSR, the data type
IdType
is designated in https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/array.cc#L809 , indicating thatIdType
is actually equal to the data type ofcoo.row
.And then, in the current implementation of these
COOToCSR
functions, the constructedret_indptr
,ret_indices
andret_data
are all set to be of dtypeIdType
. https://github.com/dmlc/dgl/blob/f0213d2163245cd0f0a90fc8aa8e66e94fd3724c/src/array/cpu/spmat_op_impl_coo.cc#L426-L433This is definitely not right because
ret_indptr
,ret_indices
andret_data
do not necessarily have the same data type. Let's break them down in detail:ret_indices
: Its dtype is the same ascoo.row.dtype
(IdType
). This is the only correct part of the current implementation.ret_indptr
: Its dtype depends on the number of non zero elements(NNZ
). IfIdType
isint32
butNNZ
exceedsINT32_MAX
, thenret_indptr.dtype
should beint64
, not the same asret_indices
.ret_data
: Its dtype should be exactly the same ascoo.data
. It could beint
,float
or evenbool
, not guaranteed to be the same asret_indices
.Question β
ret_indptr
,ret_indices
andret_data
may be completely different, why do we set them all asIdType
?IdType
? It's only applicable toret_indices
. The dtype ofret_indptr
should be determined dynamically; andret_data
are not even guaranteed to have ID-like dtype.Working Plan π§
template <class IdType>
, makingCOOToCSR
a non-template function.ret_indptr
,ret_indices
andret_data
as follow.ret_indices.dtype
<-coo.row.dtype
,ret_indptr.dtype
<- whetherNNZ
exceedsINT32_MAX
(however, ifcoo.row
is ofint64
, we setret_indptr
asint64
anyway),ret_data.dtype
<-coo.data.dtype
(ifcoo.data
is null, set dtype the same asret_indptr
).Reference π
699 : This 5-year-old PR implemented the first
COOToCSR
function withtemplate <DLDeviceType XPU, typename IdType, typename DType>
.1251 : This 4-year-old PR removed
typename DType
but kepttypename IdType
.3326 : This 3-year-old PR extended
COOToCSR
toSorted
,UnSortedDense
andUnSortedSparse
versions, but still kepttypename IdType
.None of these venerable PRs explained why we need
IdType
. π€Acknowledgement π
7364 for reporting this bug.