PhilippChr / CONVEX

Code for our CIKM 2019 paper. As far as we know, CONVEX is the first unsupervised method for conversational question answering over knowledge graphs. A demo and our benchmark (and more) can be found at
https://convex.mpi-inf.mpg.de/
MIT License
28 stars 9 forks source link

Results not changing #4

Closed mahsash closed 4 years ago

mahsash commented 4 years ago

Hi,

My question is related to https://github.com/PhilippChr/CONVEX/issues/2 but as it is closed I have to repeat it. I get the same results regardless of the domain and parameters set in settings.json. As you mentioned, I removed cached data (by deleting files from data folder) but the code is not running without these files. So, I removed parts of the code that loads those files. Now results are different from before but they remain still the same if I change parameters. I was wondering if you could help me with resolving this issue.

Thanks in advance!

PhilippChr commented 4 years ago

Hi,

Can you share your settings.json-file, please? Removing the cache was only necessary in #2 because he wanted to use another KG. If you only want to have the results on another domain, you don't need this.

Regards, Philipp

mahsash commented 4 years ago

Thanks for your reply.

I'm also using another KG (a small subset of Wikidata). This is my settings.json file: { "tagMe_token": "MyToken", "hyperparameters_frontier_detection": [0.6, 0.3, 0.1], "hyperparameters_answer_detection": [0.9, 0.1], "number_of_frontier_nodes": 3, "domain": "movies", "domain_options": ["books", "movies", "music", "soccer", "tv_series", "ALL"], "conversations_path": "data/test_set/test_set_ALL.json", "wikidata_dump_path": "/wikidata.hdt", }

mahsash commented 4 years ago

And I commented out the first part of the code in wikidata.py that loads cached data and changed it to:

identifier_predicates = {} label_dict={} predicate_frequencies_dict={} entity_frequencies_dict ={} statements_dict={}

PhilippChr commented 4 years ago

CONVEX assumes the standard wikidata format for the hdt file, which is something like: "http://www.wikidata.org/entity/Q123 http://www.wikidata.org/prop/direct/P123 *". I guess, that there are no facts fetched from the KG in your case, because the results from the KG lookups are empty. Thus, only the existential questions are answered, and for this the hyperparameters are not needed, so the results remain the same for different settings.

However, results should definitely change if you switch the domain. I also verified this once more. Is it that results are all the same across domains, or is it that results stay the same for different parameters for one domain?

mahsash commented 4 years ago

I am also using a smaller dump of Wikidata. So format is the same. How do you know that the results from the KG lookups are empty?

No, results are also the same across domains. In any case the results stay the same: MRR_score: (1792, 0.09821428571428571, 176.0) P@1: (1792, 0.08928571428571429, 160.0) H@5: (1792, 0.10714285714285714, 192.0)

PhilippChr commented 4 years ago

So if I only answer the existentials, results for books and movies are: MRR_score: (1792, 0.09821428571428571, 176.0) P@1: (1792, 0.08928571428571429, 160.0) H@5: (1792, 0.10714285714285714, 192.0) It's just a coincidence that these two results are the same.

For the soccer-domain, results (if I only answer the existentials) are: MRR_score: (1792, 0.07142857142857142, 128.0) P@1: (1792, 0.05357142857142857, 96.0) H@5: (1792, 0.08928571428571429, 160.0)

For the tv_series-domain, results (if I only answer the existentials) are: MRR_score: (1792, 0.05357142857142857, 96.0) P@1: (1792, 0.05357142857142857, 96.0) H@5: (1792, 0.05357142857142857, 96.0)

Could you run the method on the soccer and tv_series domain, and check if your results match with mine, please?

mahsash commented 4 years ago

You're right. I didn't manage to run it for soccer domain (it threw an error) but the numbers for tv_series are the same as you. So as you said there are no facts fetched from the KG. I used a smaller dump of wikidata from http://gaia.infor.uva.es/hdt/ . Do you have any advice for me? I need a smaller set of Wikidata to test your model on it because the complete wikidata dump is too huge and I cannot use it.

Thanks!

PhilippChr commented 4 years ago

I didn't manage to run it for soccer domain (it threw an error)

I will try to include a setting to disable caching in CONVEX, since it seems quite relevant to plug-in another KG (was not really intended initially).

Do you have any advice for me? I need a smaller set of Wikidata to test your model on it because the complete wikidata dump is too huge and I cannot use it.

That depends heavily on your purpose of using CONVEX and the reason why the Wikidata dump is too large for you to use.

E.g.

Without knowing your specific constraint(s) and motivation, it is hard to give suggestions. Feel free to share your purposes via mail.

mahsash commented 4 years ago

Thanks for your help! I will try to extract only the part from Wikidata which is relevant for the domain.

PhilippChr commented 4 years ago

You're welcome. No problem.