dice-group / WHALE

0 stars 0 forks source link

Implement memory map approach using `mmappickle` #4

Closed sshivam95 closed 6 months ago

sshivam95 commented 6 months ago

Dice-embedding do not support reading from memory mapped files. It directly reads the file and stores it in the main memory which causes memory overload issues if the knowledge base file is larger than main memory. Here we use mmappickle library which is a memory mapped pickled file to create indices of relations and entities.

This helps in creating a transformed training set into a numpy.ndarray of indexed train data.

sshivam95 commented 6 months ago

Fix

sshivam95 commented 6 months ago

Test on the extracted files --> Running on extracted 10M Noctua 1 - (3066765)