dice-group / WHALE

0 stars 0 forks source link

Clean and materialize domain specific dataset using sed #11

Open sshivam95 opened 5 months ago

sshivam95 commented 5 months ago

Next step to #9

sshivam95 commented 5 months ago

Update: only materializing is enough for this, the dice-embeddings library already take care of all the preprocessing part.

sshivam95 commented 5 months ago

Datasets materialized:

sshivam95 commented 4 months ago

Same as https://github.com/dice-group/WHALE/issues/9#issuecomment-2175562375

sshivam95 commented 4 months ago

N-Quads to N-triples for linking: