awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.28k stars 196 forks source link

DGL Assert fails breaks when setting `exclude_positive=True` in Sampler #71

Closed asaluja closed 4 years ago

asaluja commented 4 years ago

I needed to make some changes to the train script, in particular I wanted to change the head and tail samplers in training to exclude_positive=True here and here, but as soon as I do that this assert in dgl fails:

python3: /opt/dgl/src/graph/sampler.cc:1186: dgl::NegSubgraph dgl::{anonymous}::EdgeSamplerObject::genNegEdgeSubgraph(const dgl::Subgraph&, const string&, int64_t, bool, bool): Assertion `prev_neg_offset + neg_sample_size == neg_vids.size()' failed.

Am I using the sampler incorrectly here?

classicsong commented 4 years ago

One possibility is that the sampler can not get enough negative nodes for a batch. How large is your graph? Can you try to reduce your batchsize?

asaluja commented 4 years ago

I have around 1 million nodes and 4.5 million edges. I don't think it's batch size since I've tried reducing the batch size to 1 and the number of negative samples to 1 and I still get the same error.

classicsong commented 4 years ago

I think you can refer to how eval sampler samples edges: https://github.com/awslabs/dgl-ke/blob/de4e970c9ffa45cae7d74139f4b3b38365f6c3ad/python/dglke/dataloader/sampler.py#L500

This gives another way to exclude positive edges

asaluja commented 4 years ago

OK thanks I'll look into that. Wondering though if the above is a bug or if I'm doing something wrong?

classicsong commented 4 years ago

You need to use return_false_neg and we will look at if there is any bug in exclude_positive=True