JuliaML / MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
https://juliaml.github.io/MLDatasets.jl/stable
MIT License
229 stars 46 forks source link

OGBDataset ogbg-molhiv has wrong shape of edge features #223

Closed chrisn-pik closed 3 months ago

chrisn-pik commented 11 months ago

Wrong dimension of edge features, there are 6 edge features, but 3 should be correct

using MLDatasets, DataFrames

data = OGBDataset("ogbg-molhiv")
size(data[1].graphs.edge_data.features)

returns

(6, 20)

However,

data[1].graphs.num_edges

returns

40

I assume the correct size of the edge features would be (3,40) instead of (6,20)

I am using DataFrames v1.6.1 and MLDatasets v0.7.14

Using python and obg, the shape is (40,3):

from ogb.graphproppred import PygGraphPropPredDataset
dataset = PygGraphPropPredDataset(name = "ogbg-molhiv", root = '.')
dataset[0].edge_attr.shape
torch.Size([40, 3])

and

dataset[0].edge_index.shape
torch.Size([2, 40])
CarloLucibello commented 11 months ago

it seems likely that the graph is undirected, and when converting it to a directed one we forgot to duplicate the edge features. Would you like to file a PR with a fix? I can give some indirections

chrisn-pik commented 11 months ago

Thanks for your fast reply. That sounds like a reasonable explanation. I am not sure, if I will have the time, but providing some directions sounds good in any case.