Closed mahsash closed 4 years ago
Hi,
Can you share your settings.json-file, please? Removing the cache was only necessary in #2 because he wanted to use another KG. If you only want to have the results on another domain, you don't need this.
Regards, Philipp
Thanks for your reply.
I'm also using another KG (a small subset of Wikidata). This is my settings.json file: { "tagMe_token": "MyToken", "hyperparameters_frontier_detection": [0.6, 0.3, 0.1], "hyperparameters_answer_detection": [0.9, 0.1], "number_of_frontier_nodes": 3, "domain": "movies", "domain_options": ["books", "movies", "music", "soccer", "tv_series", "ALL"], "conversations_path": "data/test_set/test_set_ALL.json", "wikidata_dump_path": "/wikidata.hdt", }
And I commented out the first part of the code in wikidata.py that loads cached data and changed it to:
identifier_predicates = {} label_dict={} predicate_frequencies_dict={} entity_frequencies_dict ={} statements_dict={}
CONVEX assumes the standard wikidata format for the hdt file, which is something like: "http://www.wikidata.org/entity/Q123 http://www.wikidata.org/prop/direct/P123 *". I guess, that there are no facts fetched from the KG in your case, because the results from the KG lookups are empty. Thus, only the existential questions are answered, and for this the hyperparameters are not needed, so the results remain the same for different settings.
However, results should definitely change if you switch the domain. I also verified this once more. Is it that results are all the same across domains, or is it that results stay the same for different parameters for one domain?
I am also using a smaller dump of Wikidata. So format is the same. How do you know that the results from the KG lookups are empty?
No, results are also the same across domains. In any case the results stay the same: MRR_score: (1792, 0.09821428571428571, 176.0) P@1: (1792, 0.08928571428571429, 160.0) H@5: (1792, 0.10714285714285714, 192.0)
So if I only answer the existentials, results for books and movies are: MRR_score: (1792, 0.09821428571428571, 176.0) P@1: (1792, 0.08928571428571429, 160.0) H@5: (1792, 0.10714285714285714, 192.0) It's just a coincidence that these two results are the same.
For the soccer-domain, results (if I only answer the existentials) are: MRR_score: (1792, 0.07142857142857142, 128.0) P@1: (1792, 0.05357142857142857, 96.0) H@5: (1792, 0.08928571428571429, 160.0)
For the tv_series-domain, results (if I only answer the existentials) are: MRR_score: (1792, 0.05357142857142857, 96.0) P@1: (1792, 0.05357142857142857, 96.0) H@5: (1792, 0.05357142857142857, 96.0)
Could you run the method on the soccer and tv_series domain, and check if your results match with mine, please?
You're right. I didn't manage to run it for soccer domain (it threw an error) but the numbers for tv_series are the same as you. So as you said there are no facts fetched from the KG. I used a smaller dump of wikidata from http://gaia.infor.uva.es/hdt/ . Do you have any advice for me? I need a smaller set of Wikidata to test your model on it because the complete wikidata dump is too huge and I cannot use it.
Thanks!
I didn't manage to run it for soccer domain (it threw an error)
I will try to include a setting to disable caching in CONVEX, since it seems quite relevant to plug-in another KG (was not really intended initially).
Do you have any advice for me? I need a smaller set of Wikidata to test your model on it because the complete wikidata dump is too huge and I cannot use it.
That depends heavily on your purpose of using CONVEX and the reason why the Wikidata dump is too large for you to use.
E.g.
Without knowing your specific constraint(s) and motivation, it is hard to give suggestions. Feel free to share your purposes via mail.
Thanks for your help! I will try to extract only the part from Wikidata which is relevant for the domain.
You're welcome. No problem.
Hi,
My question is related to https://github.com/PhilippChr/CONVEX/issues/2 but as it is closed I have to repeat it. I get the same results regardless of the domain and parameters set in settings.json. As you mentioned, I removed cached data (by deleting files from data folder) but the code is not running without these files. So, I removed parts of the code that loads those files. Now results are different from before but they remain still the same if I change parameters. I was wondering if you could help me with resolving this issue.
Thanks in advance!