Closed sumitpai closed 5 years ago
Updated prime numbers list to have first 5 million primes. Tested on a dummy dataset of 1.7 million unique entities few hundred relations.
Dependency on prime numbers to be eliminated with the use of databases. See #74
I am getting a memory out of error for running this on around 3.8 million tuples. I did not quite understand what was the problem though. is this yet to be fixed or did I miss something in my code? :)
2019-06-03 21:39:29.697798: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.21GiB. Current allocation summary follows.
2019-06-03 21:39:39.699882: W tensorflow/core/common_runtime/bfcallocator.cc:271] ****____*****_____* 2019-06-03 21:39:39.699925: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at random_op.cc:202 : Resource exhausted: OOM when allocating tensor with shape[1486430,400] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last):
Oh I see, is the temporary fix in some other branch? :)
Yeah it's not on master. It is on feature/74. It's not completely integrated and you may have to wait for a few days before we integrate it fully and test it before releasing.
But even the master branch should work on 3.8 million tuples. We have tested with close to 4 million unique concepts and 22 million tuples. You should try increasing the batches_count (may be set it to 1000 or 10000) during training.
Supported millions of entities by doing lazy loading of entity embeddings. Only the necessary embeddings are loaded in GPU memory both during training and eval.
Fix is on branch feature/74. To be released (on master) in Ampligraph 1.1
Background and Context Currently if the number of unique entities are large (>1m) then there are chances that the system might overload the GPU memory. Description We are loading all the entities in memory and while evaluation, we generate all the corruptions at once. There should be a mechanism to batch this so that if the no of entities are too large, it does not overload the GPU and run out of memory.