ProjetPP / PPP-QuestionParsing-Grammatical

Question Parsing module for the PPP using a grammatical approch
GNU Affero General Public License v3.0
33 stars 11 forks source link

More synonyms for "where" #144

Closed waldyrious closed 9 years ago

waldyrious commented 9 years ago

I'm not sure if the list of properties here has a direct relationship with Wikidata items, but there's the property "located in the administrative territorial entity" (P131) which is much more accurate than, say, "country" (P17).

Is there a way it could be added?

yhamoudi commented 9 years ago

Thank you for your comment :)

This list contains the properties that are relevant to be used, depending on the question word. It's designed to make queries on databases (not especially wikidata).

We try to make this list concise and accurate (we hope to cover all the possible properties linked to a given question word). Your example is not covered, so we need to add a new property to our list. However, i'm not sure that located in the administrative territorial entity is the best thing to add (it's very specific), but why not adding city, town, state and locality that are all aliases of located in the administrative territorial entity and of a lot of other properties ?

waldyrious commented 9 years ago

Of course, as long as those get mapped to P131 somewhere along the chain :) so I assume it is possible, from what you're saying. How can I contribute to that mapping? I couldn't figure out where it lives.

yhamoudi commented 9 years ago

You just need to add the new properties into the list. I did it here: https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/commit/65742217a95f51c84377d8a831aa68259addf3cd

As you can see, where, in which and from which will now produce 4 more properties (city, town, state and locality). For each question word, i've updated 2 maps:

(and i've also updated a unit test: https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/commit/8efe7e6953311e163f25fb1d19c7b7da617be93b)

yhamoudi commented 9 years ago

I close the issue since it's fixed, but feel free to ask if you have any questions :)

waldyrious commented 9 years ago

I don't think I fully understood where lives the code that converts those terms into wikidata properties, but I suppose that's on https://github.com/ProjetPP/PPP-Wikidata?

Tpt commented 9 years ago

I don't think I fully understood where lives the code that converts those terms into wikidata properties, but I suppose that's on https://github.com/ProjetPP/PPP-Wikidata?

Yes, it is there. The PPP-Wikidata module does the conversion by using property labels and aliases.

waldyrious commented 9 years ago

So... where is that in the source of PPP-Wikidata? In what file? The source code is kinda hard (for me) to navigate :(

Tpt commented 9 years ago

The source code is here: https://github.com/ProjetPP/PPP-Wikidata/blob/master/src/ValueParsers/WikibaseEntityIdParser.php

To have an overview of the different namespaces of the module:

waldyrious commented 9 years ago

Ah, I think I finally got it. Let me know if this understanding of what is happening is correct:

Feel free to correct any mistaken assumptions I've made in describing the process above.

Regardless, I gota say, that was quite a ride. Are you guys sure that the system must make so many jumps? It would help if there was a simple guide that would allow people to contribute to make the question answering engine more powerful (like how I initially wanted to make it able to specify locations in a more comprehensive manner, leading to opening this issue). Are there plans to make such a contribution guide?

Ezibenroc commented 9 years ago

The module PPP-QuestionParsing-Grammatical uses the questionWIs dictionary in ppp_questionparsing_grammatical/data/questionWord.py to convert "where" into the array of values ['place', 'location', 'residence', 'country', 'city', 'town', 'state', 'locality']

Yes.

I let @Tpt answer for the questions about the Wikidata module.


Regardless, I gota say, that was quite a ride. Are you guys sure that the system must make so many jumps?

I think you got lost in the modularity of the software. It may seems complicated, but I believe this is one of the strengths of PPP. This page might help you to understand.

It would help if there was a simple guide that would allow people to contribute to make the question answering engine more powerful (like how I initially wanted to make it able to specify locations in a more comprehensive manner, leading to opening this issue). Are there plans to make such a contribution guide?

Good idea. A first guide for you: I recall that there is a normal form handled by all the modules. For instance, Where is the Eiffel Tower? may be transformed as (Eiffel Tower, location, ?). You must understand it as “to retrieve the answer of this question, I must go on the Wikidata entity of the Eiffel Tower and then look at the property location”. This normal form can be observed by clicking on the show internal results button on askplatypus.

If you are unhappy with the answer given by Platypus on such questions, there are several possibilities:

  1. The normal form does not seem right to you. The problem certainly comes from one of the QuestionParsing modules. Since there is only one deployed for the moment (QuestionParsing-Grammatical), the problem must come from this module.
  2. The normal form seems ok. The problem certainly comes from one of the backend modules. If the information necessary to answer to the question is contained in Wikidata, the problem must come from the Wikidata module.
  3. There is no normal form or you don't know if it is ok. The problem might come from any module. If you are lost, you can post on our forum.

On your example, you were unhappy with the answer of Where is the Eiffel Tower: you wanted the answer 7th arrondissement of Paris. The normal form for this one was (Eiffel Tower,[place,location,residence,country],?). The possible predicates are [place,location,residence,country], and none of these correspond to P131, the property you are looking for. Therefore, the problem came from the QuestionParsing module: we had to add one of the aliases of P131 for the location questions.

Hope this helped :)

Ezibenroc commented 9 years ago

I opened an issue for a tutorial for new contributors: https://github.com/ProjetPP/Documentation/issues/65

waldyrious commented 9 years ago

@Ezibenroc that's a great start, thank you! I'll be tracking the Documentation project closely :)

By the way, I think the "Technical overview" link in the main project page should point to the documentation repo (to prevent duplication of work and/or the need to keep both sources in sync).