ProjetPP / Documentation

Documentation and protocol specification of the Projet Pensées Profondes
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

Removes typing extension #57

Open Tpt opened 9 years ago

Tpt commented 9 years ago

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

yhamoudi commented 9 years ago

How do you type resources now?

Ezibenroc commented 9 years ago

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

Is it always ok? (I do not have any counter-example in mind)

Tpt commented 9 years ago

How do you type resources now?

An example: Bach ∩ (?, instance of, human)

Or (it's not valid in RDF but I think we can allow it in your data model) 1934 ∩ (?, instance of, date)

yhamoudi commented 9 years ago

It's a bit ugly to join an instance of triple to each resource/missing. It's kind of the same problem than with inverse predicates: we can encode it with 2 triples ((?,a,b)∪(b,reverse(a),?)) but it's better to make a clearer distinction by adding a new field (inverse predicate / type).

progval commented 9 years ago

:+1: @yhamoudi And it creates extra workload for module developers.

Ezibenroc commented 9 years ago

And it creates extra workload for module developers.

Yes. Types were an optionnal information provided to improve the precision. With this PR it would become mandatory...

Tpt commented 9 years ago

My basic point of view is: we should try to keep the datamodel as simple as possible data model in order to be easy to maintain and understand. I am afraid of having a feature explosion in the data model that would makes the work of module creation very difficult (and it's why I personally dislike the reverse-predicates that has the only advantage (for the Wikidata module) to reduce the tree size).

It's a bit ugly to join an instance of triple to each resource/missing.

You are sure that you will be able to add type annotations to each resource/triples? Imho we should only add them when you are sure they are relevant i.e. when they are explicitly stated in the question like in "Who is Bach" that would be rewritten "Bach ∩ (?, instance of, person)" or "In which country is Paris" that would be rewritten something like "(Paris, [located in, in, location], ?) ∩ (?, instance of, country)".

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word. For example will you be able to understand that "mother" may be both a relationship and a movie? An other example: type the output of "Type "Where is Paris?" is very tricky.

But I would be very happy to be wrong on it, so feel free to convince me I'm wrong.

Side remark because I believe it will arise again quickly: please no parsing of "When is born X" as "(X, birth, ?) ∩ (?, instance of, date)", because it has no real meaning: the range of a "birth" predicate would usually be an event, and cast it to date with an intersection with "(?, instance of, date)" or with a type annotation has really no semantic sense. More, it makes simple module development far mode difficult (need to do clever guesses from a "birth" predicate and a "date" type to see that it's a "birth date" we are looking for).

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

progval commented 9 years ago

On 21/02/2015 17:34, Thomas Tanon wrote:

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

Because module developpers would have to implement a simplification step that takes into account this intersection, or the module would return something that can't be used (an intersection of a resource and an instance-of triple)

Tpt commented 9 years ago

Because module developpers would have to implement a simplification step that takes into account this intersection, or the module would return something that can't be used (an intersection of a resource and an instance-of triple)

It's exactly why I've proposed the filter based on value-type and @type.

yhamoudi commented 9 years ago

What is the difference between type and value-type? What is the the JSON serialization of typing?

Tpt commented 9 years ago

What is the difference between type and value-type?

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

What is the the JSON serialization of typing?

The serialization of the type extension has not been specified yet.

yhamoudi commented 9 years ago

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

I have not been clear. I was talking of type from the datamodel (that is removed in this pull request) and value-type from the serialization. But after re-reading the doc, i have no more question on this.

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word.

I know that Watson uses thousands of types and that it's an important feature, so they probably succeed to perform a very accurate typing.

And it creates extra workload for module developers.

It's exactly why I've proposed the filter based on value-type and @type.

I'm not sure that i understand this remarks (especially about " filter based on value-type and @type"). You say (?) that having 2 triples instead of 1 is better because 2 differents modules can try to solve them. For instance, let's consider What president was born in Italy. A module M1 knows who's born in Italy, but not who are the presidents. A module M2 knows who are the presidents but not their birth places.

Depending on the datamodel we have:

I agree that removing types will solve this kind of things, but i'm not sure that it's the clean way to do. Indeed, with the same reasoning there is a lot of other parts that could be split:

I think we have 3 possibilities:

I dislike the use of instance of for types because it looks like an "hack" to have types, instead of a clean way to do it. You say that we need to keep the datamodel as simple as possible, but using "instance of" as a way to type you will need to explain the special role holds by the predicate "instance of".

Moreover, i think that we should take into account the computation time needed to solve a question. When there is only 4-5 modules to query it's easy, it could be more difficult if there were 100 modules. The shortest is the normal form, the quickest will be the algorithm (there is a balance to find between the accuracy of the answer and the speed needed to obtain it).