HydraCG / Specifications

Specifications created by the Hydra W3C Community Group
Other
138 stars 26 forks source link

Indicate a partial collection view is ordered #172

Closed pietercolpaert closed 3 years ago

pietercolpaert commented 5 years ago

Opening a difficult discussion here, yet I hope given a limited scope, we could set a good standard here:

Currently there is no way to to indicate how a partial collection view is ordered. I therefore propose a way to add this as in the following example:

<collection1> a hydra:Collection ;
              hydra:view [
                           hydra:orderedBy ( [
                                 a hydra:OrderDescription ;
                                 sh:path dbpedia-owl:birthDate ;
                                 hydra:orderOption hydra:ascending ;
                                 hydra:castTo xsd:Year
                           ]) ;
                           hydra:first <?page=2014>;
                           hydra:last <?page=2020>;
                           hydra:next <?page=2019>
                      ] .

This allows to have:

Mind that the goal is not to specify the order of object within the JSON-LD documents. A client should always order the triples or object itself when the document has been retrieved. It only tells something about how the next page related to its previous page.

Related issue: https://github.com/HydraCG/Specifications/issues/6 — However, I do not want to introduce a hypermedia control to order a partial collection view, I just want a way to indicate how a partial collection view is ordered.

Are there other proposals of how to do this? Are there proposals that can overcome this limitation?

cristianvasquez commented 5 years ago

Hi, what is the purpose of hydra:manages[] ?

pietercolpaert commented 5 years ago

Hi, what is the purpose of hydra:manages[] ?

It’s an unstable draft proposal for partially indicating the contents of a hydra:Collection. See https://github.com/HydraCG/Specifications/blob/master/drafts/collection-representation.md

pietercolpaert commented 5 years ago

How can we proceed to get this as a draft proposal? Can someone show me the way?

Mec-iS commented 5 years ago

@pietercolpaert probably the most effective way is to participate to the Monday conf-call (there are reminders every week on Slack) and/or directly open a PR on the spec. Usually the initial thing is to gather consensus, so talking about your idea in the conf-call will help for sure.

alien-mcl commented 5 years ago

Sorry for late reminder, but we've got this topic on our today's call agenda

pietercolpaert commented 5 years ago

@alien-mcl I see the issue is put on the agenda of the 11th of June again. Will try to join that confcall!

pietercolpaert commented 4 years ago

I’ve updated the issue’s example with sh:path and corrected the manages block

asbjornu commented 4 years ago

Didn't we agree in #150 that manages should be renamed to memberAssertion?

pietercolpaert commented 4 years ago

I’m just using the latest spec as published here: https://www.hydra-cg.com/spec/latest/core/#manages-block

tpluscode commented 4 years ago

Didn't we agree that manages should be renamed

Well, only kind of. Care to submit a PR?

tpluscode commented 4 years ago

@pietercolpaert Let's not have the "manages block" distract us here ;)

I think it's good in general although introducing SHACL may be controversial even if it's inevitable.

Unclear how to handle deeper ordering

This may be tricky but looks like SHACL has rich support for paths.

(e.g., ordering on the birthDate of a parent of a foaf:Person managed by the collection)

sh:path ( ex:parent dbpedia-owl:birthDate )

Also let me ask you some questions:

  1. why orderOption? In most I'm aware of this would be called order direction
  2. what is the significance/usefulness of castTo?
pietercolpaert commented 4 years ago

@pietercolpaert Let's not have the "manages block" distract us here ;)

Yes, totally agree! For the sake of it, I’ll just remove it from the example!

I think it's good in general although introducing SHACL may be controversial even if it's inevitable.

Unclear how to handle deeper ordering

This may be tricky but looks like SHACL has rich support for paths.

(e.g., ordering on the birthDate of a parent of a foaf:Person managed by the collection)

sh:path ( ex:parent dbpedia-owl:birthDate )

Yes! That’s why I started using shacl:path. I think it’s the perfect solution to describe a property path!

Also let me ask you some questions:

  1. why orderOption? In most I'm aware of this would be called order direction

I must have been thinking about other options that are not just ascending or descending. For example, it could be that all the next pages are “east of” the current page.

  1. what is the significance/usefulness of castTo?

To express that if you’d have a date literal, you order only on the year and you cannot count on an ordering based on the months. This would also allow you to order based on quarter for example.

alien-mcl commented 4 years ago

Hi @pietercolpaert https://github.com/pietercolpaert, guys

Sorry for late response - I'm very very busy with my work and heracles.net, but let me join your discussion.

I must have been thinking about other options that are not just ascending or descending. For example, it could be that all the next pages are “east of” the current page. I think these can still be 'flattened' to greater/smaller relation. In multi-dimensional sets you'd still have some kind of an one dimensional value to sort by or sort separately by each dimension. Example with east of fits more to WHERE clause rather than to ordering.

To express that if you’d have a date literal, you order only on the year and you cannot count on an ordering based on the month. This would also allow you to order based on quarter for example I somehow do not see how values sorted by date will have different order in those cases. I believe result of ordering by full date is still valid for ordering by quarter. I'm not sure I understand that issue with months - month alone is not valid for ordering. It would be like ordering text values by third char within - it doesn't tell anything. But I still do see some room for this concept when it comes on how the value should be interpreted. I think string literals are a good place for some hints, i.e. like in SQL world a collation tells on how to sort text values with specific encodings. This is the feature I'd give an orderOption.

As for the SHACL - it returned several times. I think the core vocabulary won't use it directly. Maybe we'll give some hints in the spec on how it fits the vocab. I don't want to forejudge anything though.

Regards

Karol

czw., 7 lis 2019 o 09:03 Pieter Colpaert notifications@github.com napisał(a):

@pietercolpaert https://github.com/pietercolpaert Let's not have the "manages block" distract us here ;)

Yes, totally agree! For the sake of it, I’ll just remove it from the example!

I think it's good in general although introducing SHACL may be controversial even if it's inevitable.

Unclear how to handle deeper ordering

This may be tricky but looks like SHACL has rich support for paths.

(e.g., ordering on the birthDate of a parent of a foaf:Person managed by the collection)

sh:path ( ex:parent dbpedia-owl:birthDate )

Yes! That’s why I started using shacl:path. I think it’s the perfect solution to describe a property path!

Also let me ask you some questions:

  1. why orderOption? In most I'm aware of this would be called order direction

I must have been thinking about other options that are not just ascending or descending. For example, it could be that all the next pages are “east of” the current page.

  1. what is the significance/usefulness of castTo?

To express that if you’d have a date literal, you order only on the year and you cannot count on an ordering based on the months. This would also allow you to order based on quarter for example.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HydraCG/Specifications/issues/172?email_source=notifications&email_token=AB54ONND6AWII5JNPJWHEWTQSPDVTA5CNFSM4GGVDY62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDLRXGI#issuecomment-550968217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB54ONNTLAFU4OI6PUEO6VTQSPDVTANCNFSM4GGVDY6Q .

pietercolpaert commented 4 years ago

@alien-mcl three points I should respond to:

East of as an orderOption: keep it simple and one-dimensional

Ok, I don’t have a strong opinion on that. I’m OK with orderDirection, but we need to elaborate on how to order specific literals as well (such as strings: we might want to add the specific locale for example).

castTo predicate

Imagine you have 5 pages with members with dates in 2 years (the first 2 in 2015, the next 3 in 2016). Within the first 2 pages, you do not have any ordering, but you do know that the first 2 pages are lower in year than the next 3 pages. Important for clients looking for all members in January 2015: they will need to download 2 pages, not 1.

For the castTo predicate, we could start from just the SPARQL specification on that, but we can leave it open for more options.

If the design of castTo is a blocker, we can leave it a suggestion for now.

sh:path

If it’s not part of the core spec/lib, we do need a fall-back to indicate what exactly the ordering means. Do you have a suggestion for that?

alien-mcl commented 4 years ago

We had an interesting discussion on today's call regarding this issue. In general, there are two different approaches here:

We agreed that we shall come with some cookbook examples and deliberate on pros and cons of each approach. Both approaches still will need that _querying_for_specificorder mechanism, but this is somehow another part of the story.

tpluscode commented 4 years ago

@alien-mcl those are not exclusive approaches. We need both.

client knows what he asks for and server obeys (this is the one I personally support)

There is not way to enforce that the "server obeys". If the client requests to order by "first name" and "last name" the server has full authority to ignore either or both. The client would want to understand that they got not exactly what they asked for.

Same with a request without explicit order like plain GET /people. If the server applied an order the client will not know about it without response metadata.

One more example is implicit order which is more specific than what the client requested. Say the query was /events?orderBy=year. The server will do that but to keep a stable order it may implicitly add a second order by month name to keep a stable sorting for those items which have the same value for year. Otherwise those members would "jump around" their section of the data. Which can produce really weird results when a given number of elements is greater than the page size. Subsequent requests for the same page+order may even return completely different results.

So, the client would want to know that they requested /events?orderBy=year but effectively got /events?orderBy=year,month,name,id.


Too bad you could not join @pietercolpaert. Here are my additional thoughts, some of which have been discussed during the call, some of which have not:

  1. We discussed to mandate the use of rdf:List when the server decides that it's important for the client. This is the only way which removes the necessity for the client to deeply understand the proposed hydra:orderedBy description.
  2. Still, I'm convinced that we need this description but it can be simplified when the members are actually an ordered rdf List.

What I mean by the second point is that once the response triples are actually ordered, then the client should not need to know how to actually perform the ordering. Nor should they have to. This completely eliminates the necessity for castTo and

pietercolpaert commented 4 years ago

@tpluscode I think order within the page is not interesting. The overhead for a client to do the ordering is minimal. I mainly want to describe ordering between pages: I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

@alien-mcl For client-initiated ordering, please discuss in https://github.com/HydraCG/Specifications/issues/6. This issue is particularly focused on helping clients to prune their search space.

tpluscode commented 4 years ago

I think order within the page is not interesting

You really think so? It's not about runtime overhead but implementation complexity. The spec and a generic client will have to be very sophisticated to support a universal order description. Start adding paths and castTo and it's almost certainly going to become a nightmare. Not to mention that the particular field used to sort may not even be part of the response itself, which will prevent any kind of in-memory sorting. rdf:List is the easiest way to avoid all of those issues and simplify the order description.

I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

Maybe it's not the best approach to try to standardise? Given the year example, the first design choice would be to filter and not have the client figure out the contents of further pages by analysing how the page got sorted and what are the contents.

pietercolpaert commented 4 years ago

I think order within the page is not interesting

You really think so?

Yes, and looking at how the Linked Data Platform is doing it, I’m not the only one with that opinion. I quote from their spec:

There are many cases where an ordering of the members of a container is important. LDP does not provide any particular support for server ordering of members in containers, because any client can order the members in any way it chooses based on the value of any available property of the members. Read more

I want a client to know that it will not find more items in the next pages, as it won’t match its filter any longer.

Maybe it's not the best approach to try to standardise? Given the year example, the first design choice would be to filter and not have the client figure out the contents of further pages by analysing how the page got sorted and what are the contents.

My use case is for Web APIs where supporting any kind of dynamic client-side ordering/filtering on the server-side would become too expensive (cfr. the Linked Data Fragments axis where we want to find a trade-off between server and client effort). In order to optimize caching, I want to minimize the amount of orderings I support, and describe these in-band.

I think indeed that we could hold a long debate whether or not the design would be better or worse, but that is not an argument for now supporting this small addition to the spec in Hydra.

tpluscode commented 4 years ago

looking at how the Linked Data Platform is doing it

The LDP quote seems to suggest that the client should be responsible for sorting the container members in-memory? As in, fetch 1 million members and sort in the browser?

Maybe I'm misreading it. Is it placed out of context?

that is not an argument for now supporting this small addition to the spec in Hydra.

It is because Hydra should not just be a set of various (potentially impractical) descriptions but also the way for generic clients to consume them. It is a bad idea to standardise descriptions which cannot be easily coded against IMO. That's why I think we should first consider the desired result and also different scenarios. And only then figure out the standard solution.

pietercolpaert commented 3 years ago

Discussion moved to https://github.com/HydraCG/extensions/issues/4