JuliaML / MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
https://juliaml.github.io/MLDatasets.jl/stable
MIT License
228 stars 47 forks source link

Split for OGBDatasets #172

Closed Dsantra92 closed 2 years ago

Dsantra92 commented 2 years ago

The pr adds support for simple graph link split (#107) as well as support for splits in Heterogeneous OGB Datasets.

Need Fixing before merge: #167

Dsantra92 commented 2 years ago

Tests and some fixes are left.

codecov-commenter commented 2 years ago

Codecov Report

Merging #172 (5af4f34) into master (c859843) will increase coverage by 4.10%. The diff coverage is 68.67%.

@@            Coverage Diff             @@
##           master     #172      +/-   ##
==========================================
+ Coverage   33.01%   37.12%   +4.10%     
==========================================
  Files          41       42       +1     
  Lines        2084     2182      +98     
==========================================
+ Hits          688      810     +122     
+ Misses       1396     1372      -24     
Impacted Files Coverage Δ
src/datasets/graphs/ogbdataset.jl 82.36% <68.67%> (+25.09%) :arrow_up:
src/datasets/graphs/polblogs.jl 84.21% <0.00%> (-5.27%) :arrow_down:
src/datasets/vision/emnist.jl 5.00% <0.00%> (-5.00%) :arrow_down:
src/datasets/graphs/reddit.jl 2.12% <0.00%> (-2.13%) :arrow_down:
src/datasets/graphs/tudataset.jl 1.36% <0.00%> (-1.37%) :arrow_down:
src/datasets/vision/cifar10.jl 1.20% <0.00%> (-1.21%) :arrow_down:
src/datasets/vision/cifar100.jl 1.20% <0.00%> (-1.21%) :arrow_down:
src/abstract_datasets.jl 21.87% <0.00%> (-0.71%) :arrow_down:
src/datasets/graphs/movielens.jl 0.36% <0.00%> (-0.37%) :arrow_down:
src/MLDatasets.jl 100.00% <0.00%> (ø)
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Dsantra92 commented 2 years ago

The split for heterogeneous datasets with edge-level tasks needs to be addressed. Now that I have had the chance to think about it, I really don't want to change HeteroGraph API.

CarloLucibello commented 2 years ago

Can we add a test with the dataset mentioned in https://github.com/JuliaML/MLDatasets.jl/issues/107#issuecomment-1120384637?

The split for heterogeneous datasets with edge-level tasks needs to be addressed.

it can be done in a separate PR if it is too involved

Dsantra92 commented 2 years ago

Merge #170 before this

Dsantra92 commented 2 years ago

I think we are in a good position to merge this @CarloLucibello