dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.23k stars 2.99k forks source link

[GraphBolt][CUDA] GPUCachedFeature update bug when feature dimensions differ #7377

Closed mfbalin closed 2 months ago

mfbalin commented 2 months ago

🔨Work Item

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

https://github.com/dmlc/dgl/blob/67a897fa56353f766fc84303ff99b14c79c6a896/examples/sampling/graphbolt/node_classification.py#L224

GPUCachedFeature does not currently support updating the feature with different dimensions when initialized. Our inferencing loop uses the update function. Since layers have differing hidden dimensions, we get the error below.

We need to update GPUCachedFeature so that it reconstructs the GPUCache again when the feature dimension is changed.

This bug prevents us from using GPUCachedFeature in our single GPU examples where we have the inferencing loop.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
node_classification_advanced.py 458 <module>
main()

node_classification_advanced.py 444 main
test_acc = layerwise_infer(

_contextlib.py 115 decorate_context
return func(*args, **kwargs)

node_classification_advanced.py 271 layerwise_infer
pred = model.inference(graph, features, dataloader, args.feature_device)

node_classification_advanced.py 149 inference
features.update("node", None, "feat", y)

basic_feature_store.py 134 update
self._features[(domain, type_name, feature_name)].update(value, ids)

gpu_cached_feature.py 109 update
self._feature.replace(

gpu_cache.py 48 replace
self._cache.replace(keys, values)

RuntimeError:
Values should have the correct dimensions.
mfbalin commented 2 months ago

@Rhett-Ying @frozenbugs

mfbalin commented 2 months ago

This could be a release blocker. Now that we are aware of the issue, it would be really nice if next immediate release fixed it. I moved my place today so I will work on the fix tomorrow.