dice-group / WHALE

0 stars 0 forks source link

Create indexed file for dice-embedding training #7

Closed sshivam95 closed 2 weeks ago

sshivam95 commented 2 weeks ago

Use mmappickle to create entity_to_idx.p, relation_to_idx.p, and train_set.npy for the full dataset

sshivam95 commented 2 weeks ago

Use mmappickle.mmapdict approach from #4 to tackle this issue.

sshivam95 commented 2 weeks ago

4 is very slow. Use #5 to test faster execution on memory mapped file.

sshivam95 commented 2 weeks ago

Update: As stated here in issue #9, working on this approach is only useful if size of dataset is greater than the node size provided in the cluster. Therefore, closing this now. Reopen if this method is needed in future.