commonsense / conceptnet5

Code for building ConceptNet from raw data.
Other
2.76k stars 352 forks source link

Where can I find the files: vectors/numberbatch.h5 and vectors/evaluation.h5? #164

Closed terU3760 closed 6 years ago

terU3760 commented 6 years ago

I followed the Build-process on this page: https://github.com/commonsense/conceptnet5/wiki/Build-process. It said that

" Some other files you can build by request (type snakemake followed by the file name):

vectors/numberbatch.h5: the full ConceptNet Numberbatch matrix, with a larger vocabulary and more precision than vectors/mini.h5 vectors/evaluation.h5: evaluation results comparing numberbatch.h5 to other pre-computed word embeddings "

But I can't find the files: vectors/numberbatch.h5 and vectors/evaluation.h5 . Where can I find them or they are deprecated?

jlowryduda commented 6 years ago

Which commands did you run? Specifically, did you run snakemake vectors/numberbatch.h5?

terU3760 commented 6 years ago

@jlowryduda I use the command: https://github.com/commonsense/conceptnet5.git cloned this project into my local file system where the folder is named as conceptnet5. I can only find folders named as "vectors" in folders conceptnet5, conceptnet5/data, conceptnet5/data/raw, conceptnet5/testdata/raw. And the only of them seems to match this is conceptnet5/data/vectors. But I can only find a file whose name is numberbatch-biased.h5 but no files whose names are numberbatch,h5 and evaluation.h5.

jlowryduda commented 6 years ago

Try running the following commands: snakemake data/vectors/numberbatch.h5 and snakemake data/stats/evaluation.h5. The first command generates the unbiased numberbatch.h5 from numberbatch-biased.h5, while the second one just generates evaluation.h5 (you need to request it specifically).

terU3760 commented 6 years ago

@jlowryduda Thank you so much! I have ran the command: snakemake data/vectors/numberbatch.h5 and the command: snakemake data/stats/evaluation.h5. The later one seems to never end, how long will it run for?

rspeer commented 6 years ago

I don't know. Did it ever finish? The evaluation code was designed for an earlier version of ConceptNet -- it runs in a couple of hours if you're on the aaai2017-evaluation branch, for example.

It always took a long time because it's doing parameter search (finding the best parameters for all the systems we compare against, not just our own), and it's possible that now things have gotten complex enough that the parameter search will just take way too long.