Closed gutihernandez closed 4 years ago
Hi,
yeah sorry I uploaded these datasets at a time where the original datasets were unavailable, so all I had was my processing of these datasets. Since I didn't need the original mappings, these datasets were enough for me.
You can download the original train / valid / test at this url https://everest.hds.utc.fr/lib/exe/fetch.php?media=en:fb15k.tgz
and put them in the src_data/FB15K folder before running process_datasets.
This will yield the correct rel_id / ent_id files.
If you've already run process_datasets before, you'll have to remove the folder pkg_resources.resource_filename('kbc', 'data/FB15K') in order to force re-processing the dataset (or just use another name)
Perfect! Thank you for quickly responding :)
Could you also share the:
's mappings as well please? I tried to use the same link by changing fb15k.tgz into wn18.tgz but apparently file is name something else or the source of the file is different.
Thank you very much @timlacroix!
Hi! I have a question about datasets. After calling
python kbc/process_datasets.py
and downloading the datasets, I realized that triples are in the "index" format as such: a triple -->2431 89 5452
.Where can I find a mapping which maps each of the triple indices that this repository uses, into their true labels? (e.g. -->
/m/07l450 /film/film/genre /m/082gq
)