buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

is the "@" convention in lang tag necessary? #16

Closed xristy closed 6 years ago

xristy commented 6 years ago

is the convention you've used like:

?L_TITLE?L_TITLELANG

necessary or if I want to, can I write:

?L_TITLE@?L_TITLELANG

?

MarcAgate commented 6 years ago

Hi Chris, You must use ?L_TITLE?L_TITLELANG with (for instance) L_TITLELANG=@bo-x -ewts the @ parameter cannot be part of the template because it is considered as a Threat by the InjectionTracker. Marc

Le jeudi 01 février 2018 à 07:52 -0800, Chris Tomlinson a écrit :

is the convention you've used like:

?L_TITLE?L_TITLELANG

necessary or if I want to, can I write:

?L_TITLE@?L_TITLELANG

?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

     

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/BuddhistD igitalResourceCenter/lds- pdi","title":"BuddhistDigitalResourceCenter/lds- pdi","subtitle":"GitHub repository","main_image_url":"https://cloud.g ithubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc- 7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent .com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/BuddhistDigitalResourceCenter/lds- pdi"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"is the \"@\" convention in lang tag necessary? (#16)"}],"action":{"name":"View Issue","url":"https://github.com/BuddhistDigitalResourceCenter/lds- pdi/issues/16"}}}

xristy commented 6 years ago

I guess it would be good to clarify the injection threat. Links to discussion?

Maybe we could have ?T_TITLE with the meaning that a valid language tag must be provided and permit writing

?L_TITLE@?T_TITLE

Rather than another issue, is the use of all caps for template variables part of the code or is the set of template variables determined by:

#QueryParams=

so that one might write:

#QueryParams=l_name,l_lang,i_lim
MarcAgate commented 6 years ago

Could you tell me more about the issue coming from using ?L_TITLE?L_TITLELANG ?

Regarding @ viewed as an Injection threat, it is the way the ParameterizedSparqlString class works (it just refuses @ in the query (it might even be sparql itself since it uses the lang function - I have to check). (http://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ParameterizedSparqlString.html).

eroux commented 6 years ago

Well, I see that currently we're not using the lang tags in a proper way. It seems to me that having the following mechanism would be more logical:

wdyt?

Also a less technical concern is that upper case letters are perceived as ugly, but that's more a subjective matter.

xristy commented 6 years ago

The user documentation for Parameterized SPARQL String is also useful.

It appears that the answer to my question about whether we could use lower-case versus upper case param names should be YES since those names are under our control and not required by ParameterizedSparqlString. You're using the L_, I_ and R_ as conventions to determine when to use setLiteral and so on - which seems fine and using a case-insensitive test would be easy.

As Élie points out the handling of language tags may not be quite correct, although as I understand the PSS, it doesn't check validity of the string resulting from substitution until asQuery is invoked. If this is the case then it seems that @ should not be an issue in the string!? The "@" is apparently not being interpreted until asQuery (otherwise I don't see how current queries are working).

Élie's proposal is sound in terms of permitting some validation of language tags which seems beneficial.

xristy commented 6 years ago

setLiteral(String var, String value, String lang) would presumably be the form to use, rather than the positional version.

I assume that the

#QueryParams=....

is used to validate that the params supplied in a request are the ones the template is designed to accept.

MarcAgate commented 6 years ago

Right, #QueryParams is used for validation against templated request. Obviously there is no restriction regarding params case: I suggested upper case for a better readability of the templates : that matters quite a bit when you are designing query templates. I agree we are not correctly setting up languages in templates. I am going to implement your solution and I hope jena;text will be fine with it. I am not sure that setLiteral(?S_NAME, "whatever", LT_NAME) applied to text:query ( ?S_NAME), will produce the expected jena:text/lucene syntax. I must test that first

Regarding the BCP-47 thing I will use Locale (https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html)

MarcAgate commented 6 years ago

Just tested the produced query syntax and everything's fine.

xristy commented 6 years ago

setLiteral(?S_NAME, "whatever", LT_NAME) will in fact work very well with jena:text. There is code in the interface to pay attention to rdf langString literals.

Regarding the BCP-47 there is a document that Élie has written that lays out the BUDA conventions for using BCP-47.

xristy commented 6 years ago

Obviously there is no restriction regarding params case.

So you will be changing the tests to allow arbitrary case?

MarcAgate commented 6 years ago

I would prefer to keep everything uppercase (even though you can have lower case params names with uppercase prefix). The very practical reason to that is readability (in particular when I look for params into a big non pretty json string, either in the browser or in the console - add &jsonOut=true to the url of any request of the index page and you'll understand my point). Moreover, once all the demo stuff will be gone, replaced by BLMP, we will be the only ones to see these strings, most likely for debugging or new development.

eroux commented 6 years ago

About the test for bcp-47 I think it makes sense to keep things generic and use the Locale method, if users are looking for strings in Klingon they won't find any but that shouldn't bother the sparql query engine...

xristy commented 6 years ago

You may keep what you write in upper case of course; however, the template author should have the freedom to choose. The editing/authoring facility will still present a query with template variables to the author and it is a matter-of-taste as to whether the all uppercase. mixed case, or lower case work for the person authoring the template. The underlying tool should be flexible rather than rigid.

xristy commented 6 years ago

How will the Locale class be used as far as user input of language tags? If tags are being validated then won't it be helpful to have at least a list of BUDA private-use-case extensions?

eroux commented 6 years ago

The Locale class can validate if a lang tag is a real lang tag (and not some SPARQL injection or whatever), that's all lds should do IMO. If we have a list of lang tags inside the lds, that means redeploying each time we change or add a language, or having a complex mechanism for that, I don't think it's necessary... in the UI, of course, users will choose in a predefined list, but that would be on another level... wdyt?

MarcAgate commented 6 years ago

I used the Locale.Builder class for BCP 47 validation in InjectionTracker. Regarding lang settings in query templates, the convention is as follows: A langage specific request will use L prefixed params for the string to be searched and LG prefix to indicate the associated lang.

Ex: L_NAME="chos rgyal" LG_NAME=bo-x-ewts

or simply L_NAME="'od zer" with no corresponding LG_NAME param if the language is unspecified.