HKU-MedAI / WSI-HGNN

[CVPR'23] Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning
66 stars 6 forks source link

Hi, I’ve been facing a weird problem in DGL heterogeneous processing during training. #7

Closed Noirombre closed 10 months ago

Noirombre commented 10 months ago

(pytorch) zhangyuedi@csr-SYS-4028GR-TR:~/His$ /home/zhangyuedi/anaconda3/envs/pytorch/bin/python /home/zhangyuedi/His/WSI-HGNN-main/main.py Loaded configs from /home/zhangyuedi/His/WSI-HGNN-main/configs/BRCA/HEAT4_kimia_classification_v2.yml Start training Homogeneous GNN 0%| | 0/500 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/zhangyuedi/His/WSI-HGNN-main/main.py", line 65, in main() File "/home/zhangyuedi/His/WSI-HGNN-main/main.py", line 50, in main trainer.train() File "/home/zhangyuedi/His/WSI-HGNN-main/trainer/train_gnn.py", line 96, in train for graphs, label in self.dataloader: File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1376, in collate return [self.collate(samples) for samples in transposed] File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1376, in return [self.collate(samples) for samples in transposed] File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1330, in collate batched_graphs = batch_graphs(items) File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/batch.py", line 173, in batch gidx = disjoint_union( File "/home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/heterograph_index.py", line 1432, in disjoint_union return _CAPI_DGLHeteroDisjointUnion_v2(metagraph, graphs) File "dgl/_ffi/_cython/./function.pxi", line 295, in dgl._ffi._cy3.core.FunctionBase.call File "dgl/_ffi/_cython/./function.pxi", line 227, in dgl._ffi._cy3.core.FuncCall File "dgl/_ffi/_cython/./function.pxi", line 217, in dgl._ffi._cy3.core.FuncCall3 dgl._ffi.base.DGLError: [17:24:52] /opt/dgl/src/graph/unit_graph.cc:1195: Check failed: mat.num_rows == mat.num_cols (36 vs. 19) : Stack trace: [bt] (0) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(+0x86d61a) [0x7fb1c666d61a] [bt] (1) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(dgl::UnitGraph::CreateFromCOO(long, dgl::aten::COOMatrix const&, unsigned char)+0x2a8) [0x7fb1c666ff28] [bt] (2) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(dgl::DisjointUnionHeteroGraph2(std::shared_ptr, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&)+0x408) [0x7fb1c6660f68] [bt] (3) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(+0x78926e) [0x7fb1c658926e] [bt] (4) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(+0x789414) [0x7fb1c6589414] [bt] (5) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fb1c650e3f8] [bt] (6) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so(+0x15413) [0x7fb1c4a15413] [bt] (7) /home/zhangyuedi/anaconda3/envs/pytorch/lib/python3.10/site-packages/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so(+0x15c2b) [0x7fb1c4a15c2b] [bt] (8) /home/zhangyuedi/anaconda3/envs/pytorch/bin/python(_PyObject_MakeTpCall+0x26b) [0x557a527709db]

howardchanth commented 10 months ago

Hi there,

I also encountered the same error when trying to batch heterogeneous graphs. It seems that this is because our designed HG has some compatibility issues with dgl library (any version). Since the batch size or batching does not have a large impact on time complexity, I evaded the problem by setting the batch size = 1 and manually loading them into a list. You could give it a shot as a temporary solution. Will update the code if I find the solutions. Thanks