dice-group / WHALE

0 stars 0 forks source link

Create indexed file for dice-embedding training #7

Closed sshivam95 closed 6 months ago

sshivam95 commented 6 months ago

Use mmappickle to create entity_to_idx.p, relation_to_idx.p, and train_set.npy for the full dataset

sshivam95 commented 6 months ago

Use mmappickle.mmapdict approach from #4 to tackle this issue.

sshivam95 commented 6 months ago

4 is very slow. Use #5 to test faster execution on memory mapped file.

sshivam95 commented 6 months ago

Update: As stated here in issue #9, working on this approach is only useful if size of dataset is greater than the node size provided in the cluster. Therefore, closing this now. Reopen if this method is needed in future.