facilityregistry / ihe

Issue and discussion tracking for IHE profiles related to facility registries (CSD, etc)
0 stars 0 forks source link

Requiring XQuery as a generic query language is not required by use cases #5

Open edjez opened 11 years ago

edjez commented 11 years ago

Summary

Currently the IHE CSD profile notes using XQuery as a generic language in which to send in queries. Because XQuery has such a large surface area, CSD then goes on to say:

REQUIRE a set of XQueries to be supported for conformance testing purposes.

If all CSD use cases can be met with a set of parametrized queries, it could just specify those and end up leaner and more focused.

Issues

Having an open query language as a request format has the following issues:

djritz commented 11 years ago

The idea of a short list of parameterized queries was discussed in committee. Your points are well taken, and were raised during the original discussion. There is, presently, support to move CSD forward to public comment using XQuery. You have correctly pointed out that an implementer, at their option, can strictly support only the mandatory queries and, at their option, simply parse out what they want from the inbound XQuery as parameterized tokens. The presently proposed approach would seem to be, therefore, "no hardship" for an implementer who does NOT want to support additional functionality.

edjez commented 11 years ago

OK; could we state the reasons the committee stated is support to move with XQuery so this group can understand the forces CSD is under?

To paraphrase:

  1. CSD will mandate that implementers support certain number parametrized queries - that will expressed in pre-fixed XQuery strings;
  2. It will be optional for implementers to support any additional functional aspects of XQuery.

There is hardship to the CSD authors/editors (e.g. yourself) and some to implementers as even in a small domain of tokenized queries there are many degrees of freedom and semantically equivalent queries that may not be processed or interpreted equivalently, for example:

/Facilities/facility[@ID = 12345]

not equal to

/Facilities/facility[   @ID    =   12345]    //notice extra spaces
/Facilities/facility[@ID = 12345]

not equal to

/Facilities/facility[@ID eq 12345]   //notice 'eq' instead of '=' operator

then if you support all of XQuery, supporting the full gamut of declarative queries is one thing; and supporting full FLWOR expressions is yet another other can of worms.

It is a pity we didn't get in time to expose the logic behind not exposing XQuery as the query language. CSD could support the same exchanges without XQuery, and could optionally be complemented by another XQuery-dedicated profile.

I think this discussion is also helpful in the OHIE community to illustrate the risks inherent to embedding sub-language profiles into larger specs.

djritz commented 11 years ago

The role of the CSD profile is to support secure, standards-based querying of related information across Organizations, Facilities, Services and Providers -- and optionally a freeBusy service. Presently these data are not all in one place, so we need to be able to express the interrelationships so that we can do a query against a JOIN of the content.

Is this easy to do?

edjez commented 11 years ago

Expressing the interrelationships in secure and standards-based ways is easy to do, if the services owning each piece of information follow good practices on exposing resources in addressable ways with appropriately specified interlinking fields (REST service interfaces is a good way to achieve this). Some of these best practices are already standards based and rapidly becoming industry standards of their own.

Aggregators can then decide to do queries with traversals, eager loading, and whatever filtering they decide, while respecting the boundaries and autonomy of the underlying services.

A challenge with CSD as it stands today is it adds forces against service orientation; adding incentives to glom everything up into one database/store just to make the query processing simpler. A caching aggregator could do the job, but LMICs benefit from deployments that have less moving parts. Maybe future versions of CSD will approach this differently and I'm sure we'll learn a lot from this version.

On Jun 4, 2013, at 9:19 AM, djritz notifications@github.com wrote:

The role of the CSD profile is to support secure, standards-based querying of related information across Organizations, Facilities, Services and Providers -- and optionally a freeBusy service. Presently these data are not all in one place, so we need to be able to express the interrelationships so that we can do a query against a JOIN of the content.

Is this easy to do?

— Reply to this email directly or view it on GitHub.

djritz commented 11 years ago

This message (see below) was just posted to the IHE ITI tech committee Google group. I’m “cross posting” it here, too.

I cut/paste the GitHub email reply address into Outlook… Hope this works… ;-)

Derek.

From: ititech@googlegroups.com [mailto:ititech@googlegroups.com] On Behalf Of derek.ritz Sent: June 6, 2013 5:37 PM To: ititech@googlegroups.com Subject: [ititech:3799] A modified approach re: CSD

Hi all.

Based on feedback from 3 implementer stakeholders, a change in the CSD profile's transaction flow is proposed. The draft for public comment is being modified accordingly and will be posted soon to the ftp site. In the meantime, the new approach is described by the PPT deck (attached).

The gist of the change is to support Consumer actors using XQuery to query against a CSD-conformant XML document maintained by the Manager actor. As before, optional FreeBusy information may also be queried. The XQuery and the optional FreeBusy query are contained in two elements of a simple XML document which is POSTed to the Manager using https. The Manager returns the result(s) as an XML document.

The Manager maintains a replica XML document. This replica is kept up to date by the Manager, who regularly polls Directory actors (on a period, e.g. hourly or daily, to be established by the implementing jurisdiction). The Manager sends an https GET message to each Directory with a timestamp parameter indicating the timestamp of the last refresh. It is the responsibility of the Directory to return to the Manager an XML document, conformant with CSD's XML Schema (xsd) specification, that contains all the OrganizationDirectory, FacilityDirectory, ServiceDirectory and/or ProviderDirectory elements that have been inserted or updated since the specified timestamp. The Manager uses these returned XML documents to refresh its CSD replica.

This modified approach means Directory actors will not have to natively support XQuery. As long as they can generate a conformant XML document in response to the Manager's poll, they will be able to satisfy the CSD profile's requirements. The Manager will respond to Consumer actor's XQuery queries using its replica XML document. This should improve performance and it will usefully protect the Directory actors from potential security hazards; Consumers, as before, will not interact directly with Directory actors. The optional FreeBusy support remains largely unchanged.

Comments are welcome,

Derek.

~~ej edited 6/6; formatting newlines to make more Derek's post more readable. Unsuccessfully. Thanks Outlook.

edjez commented 11 years ago

The PPT Derek mentions can now be seen here:

https://github.com/facilityregistry/ihe/blob/master/CSD/Docs/13-06-05%20A%20Modified%20Approach%20for%20CSD.pptx

bobjolliffe commented 11 years ago

Still quite a few misgivings, though it is better than before.

The idea of aggregating/caching the data from different directories is important if you want to be able to scale. That is good. Its an open question whether the participating authoritative directories should have to speak CSD or whether the burden of translation could be borne by the CSD manager but that is perhaps detail.

I've had a brief play with xquerying on orgunits from Nigeria (around 40000 facilities) and very basic stuff which happens at the blink of an eye from a relational database is serious grunt work for java based xquery processors - both my oxygen and exist-db died under the load. Probably they could survive with more memory allocation and/or the non-opensource saxon-ee but the prognosis is not good for more complex associations between nodes. The c-compiled Zorba processor performed orders of magnitude better which I guess is not surprising.

I think the way I would approach the front end interfaces would indeed be to provide some canned recipes of types of queries which can be encapsulated in REST endpoints and hide the Xquery beneath the surface. Its a great idea to have a flexible query language but (i) I think it imposes unnecessary tech choices to meet the known use cases (ii) there are questions of scalability which need to be researched before prescribing so narrowly and (iii) tying down the xquery engine to a safe subset of the language could be a tough challenge.

I have chatted a bit offline with Carl and Derek and they know that there is a lot I like about the proposal and I'd really like to see it work. What would swing it for me is to see some sort of working prototype. That might emerge during the public comment period. If not then I can see the period being be used to gather proposals of the canned queries.

Bob

On 7 June 2013 00:12, edjez notifications@github.com wrote:

The PPT Derek mentions can now be seen here:

https://github.com/facilityregistry/ihe/blob/master/CSD/Docs/13-06-05%20A%20Modified%20Approach%20for%20CSD.pptx

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19080229 .

edjez commented 11 years ago

+11 on Bob's recommendation:

I think the way I would approach the front end interfaces would indeed be to provide some canned recipes of types > of queries which can be encapsulated in REST endpoints and hide the Xquery beneath the surface.

Agree, more prototyping would inform the spec a lot.

edjez commented 11 years ago

My point was not that caching is a bad strategy (in some contexts it could be good), but when an interop/exchange spec has to get deep into functional descriptions of components and mandate behaviors to meet non-specified crosscutting requirements there is a 'smell' and CSD could benefit from us from analyzing that a bit. Maybe some of this is profile implementation guidelines or "implementers' notes", not profile specification.

Exchange profiles with good longevity, reuse, and adoption tend to boil down to "Over such pipe I send you this and you return that". It's part of the encapsulation principle - an exchange profile needs to help all parties know less about what's on other side while maximizing freedom to do whatever may be needed on their own.

bobjolliffe commented 11 years ago

I think the problem (or at least a problem) CSD is trying to solve is how to provide a flexible query language .. one that is as least as good and a bit better than ldap.

xquery is certainly that. And there are various ways people have figured out how to post it around. For example the latest iteration of the proposal suggests something like the extended query POST requests which I see are supported by eXist-db ( http://exist-db.org/exist/apps/doc/devguide_rest.xml). I've just tried them out and after some fiddling they work ok on my 40000 Nigerian orhgunits.

I think if you are going to provide an xquery engine (and expose it and thus require it) then you pretty much are mandating an xml database of some sort. Smaller countries like Rwanda might get away with managing an in-memory XDM document incarnated off the file system, but otherwise you are looking at the likes of eXist, baseX or even marklogic to achieve any kind of scale. Given that you are dealing with end to end xml workflow I actually think thats ok, but I guess I am an xml guy .

Otherwise it pretty much conforms to Ed's adage for good exchange profiles, with slight rephrasing: Over such pipe I send you this Xquery and you return whatever it is my xquery is allowed to do.

The problem is that even though it need not be stated, the mere requirement of supporting xquery on a large xml CSD document implies a deep enough xml stack of sorts.

The questions we need answered are (i) how much flexibility do the perspective users REALLY need? Is there any way to find out? Is it really more than what could be reasonably templated

(ii) if they really do need the max degrees of freedom xquery could give them then how must the spec be tightened to minimise the damage that could be done with an xquery script gone wild. version 1.0 or 3.0? modules? ...

(iii) how much xquery would the poor buggers need to learn. I can foresee even if you did expose an xquery endpoint for power-users, you would still have to provide something simpler for the majority to just punch in parameters and get results.

But it comes down to (i). If they really need a flexible query language, and experience from HPD and ldap seems to be telling them that, then xquery certainly is that. Though quite a beast to get under control. Are there other alternatives?

On 7 June 2013 16:56, edjez notifications@github.com wrote:

My point was not that caching is a bad strategy (in some contexts it could be good), but when an interop/exchange spec has to get deep into functional descriptions of components and mandate behaviors to meet non-specified crosscutting requirements there is a 'smell' and CSD could benefit from us from analyzing that a bit. Maybe some of this is profile implementation guidelines or "implementers' notes", not profile specification.

Exchange profiles with good longevity, reuse, and adoption tend to boil down to "Over such pipe I send you this and you return that". It's part of the encapsulation principle - an exchange profile needs to help all parties know less about what's on other side while maximizing freedom to do whatever may be needed on their own.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19115803 .

mberg commented 11 years ago

This whole thread and the fact that we're trying to support things like XQuery when I thought we were just planning on serving up basic json scares me.

Maybe this complexity is needed but I feel we're getting a bit away from the original spirit of keeping it simple when we started this.

Thanks,

Matt

On Fri, Jun 7, 2013 at 7:45 PM, bobjolliffe notifications@github.comwrote:

I think the problem (or at least a problem) CSD is trying to solve is how to provide a flexible query language .. one that is as least as good and a bit better than ldap.

xquery is certainly that. And there are various ways people have figured out how to post it around. For example the latest iteration of the proposal suggests something like the extended query POST requests which I see are supported by eXist-db ( http://exist-db.org/exist/apps/doc/devguide_rest.xml). I've just tried them out and after some fiddling they work ok on my 40000 Nigerian orhgunits.

I think if you are going to provide an xquery engine (and expose it and thus require it) then you pretty much are mandating an xml database of some sort. Smaller countries like Rwanda might get away with managing an in-memory XDM document incarnated off the file system, but otherwise you are looking at the likes of eXist, baseX or even marklogic to achieve any kind of scale. Given that you are dealing with end to end xml workflow I actually think thats ok, but I guess I am an xml guy .

Otherwise it pretty much conforms to Ed's adage for good exchange profiles, with slight rephrasing: Over such pipe I send you this Xquery and you return whatever it is my xquery is allowed to do.

The problem is that even though it need not be stated, the mere requirement of supporting xquery on a large xml CSD document implies a deep enough xml stack of sorts.

The questions we need answered are (i) how much flexibility do the perspective users REALLY need? Is there any way to find out? Is it really more than what could be reasonably templated

(ii) if they really do need the max degrees of freedom xquery could give them then how must the spec be tightened to minimise the damage that could be done with an xquery script gone wild. version 1.0 or 3.0? modules? ...

(iii) how much xquery would the poor buggers need to learn. I can foresee even if you did expose an xquery endpoint for power-users, you would still have to provide something simpler for the majority to just punch in parameters and get results.

But it comes down to (i). If they really need a flexible query language, and experience from HPD and ldap seems to be telling them that, then xquery certainly is that. Though quite a beast to get under control. Are there other alternatives?

On 7 June 2013 16:56, edjez notifications@github.com wrote:

My point was not that caching is a bad strategy (in some contexts it could be good), but when an interop/exchange spec has to get deep into functional descriptions of components and mandate behaviors to meet non-specified crosscutting requirements there is a 'smell' and CSD could benefit from us from analyzing that a bit. Maybe some of this is profile implementation guidelines or "implementers' notes", not profile specification.

Exchange profiles with good longevity, reuse, and adoption tend to boil down to "Over such pipe I send you this and you return that". It's part of the encapsulation principle - an exchange profile needs to help all parties know less about what's on other side while maximizing freedom to do whatever may be needed on their own.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/ihe/issues/5#issuecomment-19115803> .

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19118638 .

edjez commented 11 years ago

Matt, sorry for the confusion. This is feedback to the IHE profile (see the facilityregistry/IHE repo) - a spec that is orthogonal to other interfaces (FRED, GeoJSON etc) and that so far aggregates Providers, Facilities, Organizations.

You can see a relatively recent version here: https://github.com/facilityregistry/ihe/blob/master/CSD/Docs/IHE_ITI_TF_Supplement_Care_Services_Discovery_13-05-03.docx @djritz if you have a newer version to point to pls share it. I can post it.

On Jun 7, 2013, at 9:54 AM, Matt Berg notifications@github.com wrote:

This whole thread and the fact that we're trying to support things like XQuery when I thought we were just planning on serving up basic json scares me.

Maybe this complexity is needed but I feel we're getting a bit away from the original spirit of keeping it simple when we started this.

Thanks,

Matt

On Fri, Jun 7, 2013 at 7:45 PM, bobjolliffe notifications@github.comwrote:

I think the problem (or at least a problem) CSD is trying to solve is how to provide a flexible query language .. one that is as least as good and a bit better than ldap.

xquery is certainly that. And there are various ways people have figured out how to post it around. For example the latest iteration of the proposal suggests something like the extended query POST requests which I see are supported by eXist-db ( http://exist-db.org/exist/apps/doc/devguide_rest.xml). I've just tried them out and after some fiddling they work ok on my 40000 Nigerian orhgunits.

I think if you are going to provide an xquery engine (and expose it and thus require it) then you pretty much are mandating an xml database of some sort. Smaller countries like Rwanda might get away with managing an in-memory XDM document incarnated off the file system, but otherwise you are looking at the likes of eXist, baseX or even marklogic to achieve any kind of scale. Given that you are dealing with end to end xml workflow I actually think thats ok, but I guess I am an xml guy .

Otherwise it pretty much conforms to Ed's adage for good exchange profiles, with slight rephrasing: Over such pipe I send you this Xquery and you return whatever it is my xquery is allowed to do.

The problem is that even though it need not be stated, the mere requirement of supporting xquery on a large xml CSD document implies a deep enough xml stack of sorts.

The questions we need answered are (i) how much flexibility do the perspective users REALLY need? Is there any way to find out? Is it really more than what could be reasonably templated

(ii) if they really do need the max degrees of freedom xquery could give them then how must the spec be tightened to minimise the damage that could be done with an xquery script gone wild. version 1.0 or 3.0? modules? ...

(iii) how much xquery would the poor buggers need to learn. I can foresee even if you did expose an xquery endpoint for power-users, you would still have to provide something simpler for the majority to just punch in parameters and get results.

But it comes down to (i). If they really need a flexible query language, and experience from HPD and ldap seems to be telling them that, then xquery certainly is that. Though quite a beast to get under control. Are there other alternatives?

On 7 June 2013 16:56, edjez notifications@github.com wrote:

My point was not that caching is a bad strategy (in some contexts it could be good), but when an interop/exchange spec has to get deep into functional descriptions of components and mandate behaviors to meet non-specified crosscutting requirements there is a 'smell' and CSD could benefit from us from analyzing that a bit. Maybe some of this is profile implementation guidelines or "implementers' notes", not profile specification.

Exchange profiles with good longevity, reuse, and adoption tend to boil down to "Over such pipe I send you this and you return that". It's part of the encapsulation principle - an exchange profile needs to help all parties know less about what's on other side while maximizing freedom to do whatever may be needed on their own.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/ihe/issues/5#issuecomment-19115803> .

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19118638 .

— Reply to this email directly or view it on GitHub.

mberg commented 11 years ago

ah ok sorry about that :)

On Fri, Jun 7, 2013 at 8:02 PM, edjez notifications@github.com wrote:

Matt, sorry for the confusion. This is feedback to the IHE profile (see the facilityregistry/IHE repo) - a spec that is orthogonal to other interfaces (FRED, GeoJSON etc) and that so far aggregates Providers, Facilities, Organizations.

You can see a relatively recent version here:

https://github.com/facilityregistry/ihe/blob/master/CSD/Docs/IHE_ITI_TF_Supplement_Care_Services_Discovery_13-05-03.docx @djritz if you have a newer version to point to pls share it. I can post it.

On Jun 7, 2013, at 9:54 AM, Matt Berg notifications@github.com wrote:

This whole thread and the fact that we're trying to support things like XQuery when I thought we were just planning on serving up basic json scares me.

Maybe this complexity is needed but I feel we're getting a bit away from the original spirit of keeping it simple when we started this.

Thanks,

Matt

On Fri, Jun 7, 2013 at 7:45 PM, bobjolliffe notifications@github.comwrote:

I think the problem (or at least a problem) CSD is trying to solve is how to provide a flexible query language .. one that is as least as good and a bit better than ldap.

xquery is certainly that. And there are various ways people have figured out how to post it around. For example the latest iteration of the proposal suggests something like the extended query POST requests which I see are supported by eXist-db ( http://exist-db.org/exist/apps/doc/devguide_rest.xml). I've just tried them out and after some fiddling they work ok on my 40000 Nigerian orhgunits.

I think if you are going to provide an xquery engine (and expose it and thus require it) then you pretty much are mandating an xml database of some sort. Smaller countries like Rwanda might get away with managing an in-memory XDM document incarnated off the file system, but otherwise you are looking at the likes of eXist, baseX or even marklogic to achieve any kind of scale. Given that you are dealing with end to end xml workflow I actually think thats ok, but I guess I am an xml guy .

Otherwise it pretty much conforms to Ed's adage for good exchange profiles, with slight rephrasing: Over such pipe I send you this Xquery and you return whatever it is my xquery is allowed to do.

The problem is that even though it need not be stated, the mere requirement of supporting xquery on a large xml CSD document implies a deep enough xml stack of sorts.

The questions we need answered are (i) how much flexibility do the perspective users REALLY need? Is there any way to find out? Is it really more than what could be reasonably templated

(ii) if they really do need the max degrees of freedom xquery could give them then how must the spec be tightened to minimise the damage that could be done with an xquery script gone wild. version 1.0 or 3.0? modules? ...

(iii) how much xquery would the poor buggers need to learn. I can foresee even if you did expose an xquery endpoint for power-users, you would still have to provide something simpler for the majority to just punch in parameters and get results.

But it comes down to (i). If they really need a flexible query language, and experience from HPD and ldap seems to be telling them that, then xquery certainly is that. Though quite a beast to get under control. Are there other alternatives?

On 7 June 2013 16:56, edjez notifications@github.com wrote:

My point was not that caching is a bad strategy (in some contexts it could be good), but when an interop/exchange spec has to get deep into functional descriptions of components and mandate behaviors to meet non-specified crosscutting requirements there is a 'smell' and CSD could benefit from us from analyzing that a bit. Maybe some of this is profile implementation guidelines or "implementers' notes", not profile specification.

Exchange profiles with good longevity, reuse, and adoption tend to boil down to "Over such pipe I send you this and you return that". It's part of the encapsulation principle - an exchange profile needs to help all parties know less about what's on other side while maximizing freedom to do whatever may be needed on their own.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/ihe/issues/5#issuecomment-19115803>

.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/ihe/issues/5#issuecomment-19118638> .

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19119589 .

ghost commented 11 years ago

Just a question.. If CSD is seen as a document why not piggyback on the existing ITI specs for document exchange and query? (Similar to how XDS-I specializes XDS for imaging content). IMO this would be more consistent with existing XD* infrastructure. Consumers could simply use already existing XD* operations to maintain a CSD free/busy document related to a particular facility / provider / org. Additionally organizations could just provide a CSD export of facilities, providers and orgs in their affinity domain. Apologies if this is off topic, just my first reaction when flipping through the ppt.

djritz commented 11 years ago

Hi Justin. No, the CSD profile is not thinking about a document in the same way XDS thinks of a document. It is not about trying to move the entire CSD document from a repository to the client; it is more about supporting a Consumer who wants to return elements of an XML document using XQuery in the same way a client might want to return specific rows and columns from relational database using SQL.The XML document itself is "constructed" by receiving snippets of XML from multiple Directory actors and assembling them into a navigable, searchable, queryable cached "document" based on an XML schema which interlinks organizations, facilities, services and providers.

Hope that helps,

Derek.

PS: Ed, I will send you a link to the latest CSD profile draft when it is posted to the IHE ftp server.

On Fri, Jun 7, 2013 at 1:14 PM, justin-fyfe1 notifications@github.comwrote:

Just a question.. If CSD is seen as a document why not piggyback on the existing ITI specs for document exchange and query? (Similar to how XDS-I specializes XDS for imaging content). IMO this would be more consistent with existing XD* infrastructure. Consumers could simply use already existing XD* operations to maintain a CSD free/busy document related to a particular facility / provider / org. Additionally organizations could just provide a CSD export of facilities, providers and orgs in their affinity domain. Apologies if this is off topic, just my first reaction when flipping through the ppt.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/ihe/issues/5#issuecomment-19120233 .

Derek Ritz

This email may contain confidential information intended only for the recipient. If you receive it by accident, please delete it.

litlfred commented 11 years ago

@mberg I do hope that, in order to facilitate support the HPD and CSD profiles, the FRED API will explicitly support an XML format per: https://github.com/facilityregistry/fred-api/issues/62