Improve datamodel - Githubissues

yhamoudi commented 9 years ago

Some minor modifications
An important change: (la, lb, lc) is true iff for all a ∈ la, there exists b ∈ lb such that for all c ∈ lc, (a, b, c) is true. If we don't do this, we cannot use alternative words into triples (otherwise full triples will be almost always false...)

Ezibenroc commented 9 years ago

:+1:

Tpt commented 9 years ago

+1 for misspelling fixes.

After reflexion, I'm also ok for the change from forall to exists of predicate quantifier because it make sense, imho, to say that (A, B, C) is true if, and only if, forall a, c there is a relation between a and c in B.

@s-i-newton Your opinion?

yhamoudi commented 9 years ago

(A, B, C) is true if, and only if, forall a, c there is a relation between a and c in B.

it's not exactly what i mean:

my proposition: (A,B,C) true <=> ∀ a∈A, ∃ b∈B, ∀ c∈C, (a,b,c) true
yours: (A,B,C) true <=> ∀ a∈A, ∀ c∈C, ∃ b∈B, (a,b,c) true

i don't know what is the best

Ezibenroc commented 9 years ago

([France],[president, capital], [François Hollande, Paris]) is false with the @yhamoudi proposition, and true with the @Tpt proposition. @yhamoudi proposition seems to be better. But maybe there is also counter-examples for it.

yhamoudi commented 9 years ago

If we consider that lists of predicates are only multiple attempts to nounify a verb, we should say that only one of them is supposed to be true for all triples: (A,B,C) true <=> ∃ b∈B, ∀ a∈A, ∀ c∈C, (a,b,c) true (ie: there exists a same link between all the elements of A and C).

In the same way, we could change:

actually : (A,B,?) = {c / ∃ a∈A, ∃ b∈B, (a,b,c) true}
change : D = { b∈B / ∀ a∈A, ∃ c∈C, (a,b,c)} and then (A,B,?) = {c / ∃ a∈A, ∃ d∈D, (a,d,c) true}

And:

actually : (?,B,C) = {a / ∃ b∈B, ∃ c∈C, (a,b,c) true}
change : (?,B,C) = biggest set E such that ∃ b∈B, ∀ e∈E, ∃ c∈C, (e,b,c) true

Tpt commented 9 years ago

@yhamoudi "If we consider that lists of predicates are only multiple attempt to nounify a verb": we really should avoid to be so much related to an algorithm in order to be as generic and stable as possible. It isn't an acceptable semantic for the predicate parameter of the triple node to say "it's the result of the nounification of a verb": we can't assume that every code that will output questions encoded in the data model will be a NLP tool.

Your change will really increase the complexity of the semantic of triple with holes and I don't think it's a good idea: why should we prefer the predicate that gives the most of results against the others? Maybe an other predicate that gives less but more accurate results was the good one. So, this distinction is maybe not the right one.

About the change of full triples, I liked the 3 "for all" because it shows that the three arguments (subject, predicate and object) where more or less "symmetric". But, as there is a real use case I'm ok to change its definition if we find an other one that has nice and easy to understand semantic.

In fact I think all the debate is around the question "do we have nice and accurate predicates" (current assumption of the data model) or 'crappy words" (way you want to take because the nounification algorithms can't do better). It's a nearly fundamental change of mind. So, I think this pull request opens, in fact, a far bigger question that just changing the semantic of a node.

@Ezibenroc "proposition seems to be better" -> I'm not sure that a single example is a good argument to prefer a definition against an other. I would be very happy to agree with your point of view if you give a such semantic.

yhamoudi commented 9 years ago

"If we consider that lists of predicates are only multiple attempt to nounify a verb": we really should avoid to be so much related to an algorithm in order to be as generic and stable as possible. It isn't an acceptable semantic for the predicate parameter of the triple node to say "it's the result of the nounification of a verb": we can't assume that every code that will output questions encoded in the data model will be a NLP tool.

It's relevant to have lists for subjects or objects. But is it natural/necessary to have a list for predicates (when predicates are not seen as "alternatives) ? If I want the birth date + the birth place of Obama, I split the questions in 2 parts (same problem than before, should we handle Who is the president of France and the capital of China?) or I use the normal form (Obama,birth place,?) ∪ (Obama, birth date, ?).

Predicates are (almost) the only part of the normal form where new words can appear (ie words that are not in the initial question). All the algorithms that produce normal forms from questions need more "freedom" on these kind of nodes because they have to guess what they should add. Other modules that output questions and want to have several predicates in a node just have to split the triple with ∪.

In fact I think all the debate is around the question "do we have nice and accurate predicates" (current assumption of the data model) or 'crappy words" (way you want to take because the nounification algorithms can't do better). It's a nearly fundamental change of mind. So, I think this pull request opens, in fact, a far bigger question that just changing the semantic of a node.

If "nice and accurate predicates" means to have exactly the right predicate, it seems very difficult (What does a platypus eat? > eat = diet, What can you eat at a fast-food restaurant? > eat = food). We do not restrict the expressiveness of the datamodel if we consider list of predicates as alternatives (just use ∪ if you want (Obama,[birth place, birth date],?)), so where is the problem?

Your change will really increase the complexity of the semantic of triple with holes and I don't think it's a good idea: why should we prefer the predicate that gives the most of results against the others? Maybe an other predicate that gives less but more accurate results was the good one. So, this distinction is maybe not the right one.

I don't know if it's relevant or not to change also the way we consider triples with hole. But we must have the same policy concerning triples with and without holes. If we agree that predicates are lists of alternatives for full triples, then it must also apply for triples with hole (and eventually change the way we evaluate them, because at least one triple is supposed to exists for all the subjects)

Another possibility is to allow only one predicate (per triple) into the datamodel and to introduce the notion of "alternatives predicates" only into the implementation.

Tpt commented 9 years ago

"Who is the president of France and the capital of China?": Yes we should definitively handle this kind of questions. And it's already done with the clean normal form "(France, president, ?) ∪ (China, capital, ?)". I don't see what is the link with the current problem.

Ok. I buy your arguments for relaxing predicates and I'm ok to change the full triple specification.

Now, two options:

Define full triple as (A, B, C) <=> ∀ a∈A, ∃ b∈B, ∀ c∈C, (a,b,c) (your proposal). It requires to change the definition of triples with hole and, so, requires to assumes that in B there is a "good" predicate (the b chosen) and that other that are "bad" ones.
Define full triple as (A,B,C) <=> ∀ a∈A, ∀ c∈C, ∃ b∈B (a,b,c) (mine) and, so don't change the semantic of triples with hole.

I prefer option 2 because:

For triple with hole what is the "b" we should choose? The one with the more important number of results for the current module? What if the module x choose a "b" and the module y an other? Should the core keep all results or choose a "b" and removes the results from the other "b"? The option 2 do not create this issue as all "b" are equals.
It does not change triple with hole definition and so is a less disruptive change.

yhamoudi commented 9 years ago

"Who is the president of France and the capital of China?": Yes we should definitively handle this kind of questions. And it's already done with the clean normal form "(France, president, ?) ∪ (China, capital, ?)". I don't see what is the link with the current problem.

Related to https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/issues/73. I don't think that representing this kind of questions with a list of predicates that are totally differents is a nice way to do. It was an argument to define lists of predicates as lists of alternatives.

Concerning the way we define full triples, none of the 2 options totally convince me, so I agree to choose the 2nd one because it has the least impact.

(the datamodel has been updated, those that don't agree must speak or it will be merged as it)

Ezibenroc commented 9 years ago

There is an asymmetry between full triples and triples with holes that I find strange. If you take lc=(la, lb, ?) then (la,lb,lc) is not necessarily true.

Tpt commented 9 years ago

+1 for the current version. I think we should wait an agreement from @s-i-newton and @ProgVal before merge.

Tpt commented 9 years ago

@Ezibenroc It's not a new problem. We should maybe relax the definition of full triple with 3 exists in order to have a symmetry... But it's an other topic

Ezibenroc commented 9 years ago

It's not a new problem. We should maybe relax the definition of full triple with 3 exists in order to have a symmetry... But it's an other topic

Alright. +1 for the merge.

progval commented 9 years ago

:+1:

marc-chevalier commented 9 years ago

It's consistant and... all is alright.

Tpt commented 9 years ago

Yeah! \o/

ProjetPP / Documentation

Improve datamodel #48