Accenture / AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Apache License 2.0
2.14k stars 251 forks source link

Support for large number of entities(>1m) #61

Closed sumitpai closed 5 years ago

sumitpai commented 5 years ago

Background and Context Currently if the number of unique entities are large (>1m) then there are chances that the system might overload the GPU memory. Description We are loading all the entities in memory and while evaluation, we generate all the corruptions at once. There should be a mechanism to batch this so that if the no of entities are too large, it does not overload the GPU and run out of memory.

sumitpai commented 5 years ago

Updated prime numbers list to have first 5 million primes. Tested on a dummy dataset of 1.7 million unique entities few hundred relations.

sumitpai commented 5 years ago

Dependency on prime numbers to be eliminated with the use of databases. See #74

idigitopia commented 5 years ago

I am getting a memory out of error for running this on around 3.8 million tuples. I did not quite understand what was the problem though. is this yet to be fixed or did I miss something in my code? :)

2019-06-03 21:39:29.697798: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.21GiB. Current allocation summary follows.

2019-06-03 21:39:39.699882: W tensorflow/core/common_runtime/bfcallocator.cc:271] ****____*****_____* 2019-06-03 21:39:39.699925: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at random_op.cc:202 : Resource exhausted: OOM when allocating tensor with shape[1486430,400] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last):

idigitopia commented 5 years ago

Oh I see, is the temporary fix in some other branch? :)

sumitpai commented 5 years ago

Yeah it's not on master. It is on feature/74. It's not completely integrated and you may have to wait for a few days before we integrate it fully and test it before releasing.

sumitpai commented 5 years ago

But even the master branch should work on 3.8 million tuples. We have tested with close to 4 million unique concepts and 22 million tuples. You should try increasing the batches_count (may be set it to 1000 or 10000) during training.

sumitpai commented 5 years ago

Supported millions of entities by doing lazy loading of entity embeddings. Only the necessary embeddings are loaded in GPU memory both during training and eval.

sumitpai commented 5 years ago

Fix is on branch feature/74. To be released (on master) in Ampligraph 1.1