Open buckett opened 9 years ago
When attempting to parse a behaviour="cit(.,'uri://something')
it would be good to know how I should parse the arguments.
in tei-pm, there should be a datatype for each parameter of a function. That should deal with this? XPaths are not quoted, strings are.
For example how is a "
escaped in a string? I'm guessing the existing implementation treats the function as an XSLT function and so the parsing rules are the same as XSLT function parsing rules.
um. we have no idea! we don't know how we'd handle that in XSLT.
So are strings assumed to be XML encoded, so a string of "Hello" said the policeman
should be written as "Hello" said the policeman
?
That doesn't help you, because the XML parser expands the entities into Unicode anyway. I honestly dont know how to deal with this.
On 24 March 2015 at 15:59, Matthew Buckett notifications@github.com wrote:
So are strings assumed to be XML encoded, so a string of "Hello" said the policeman should be written as "Hello" said the policeman ?
— Reply to this email directly or view it on GitHub https://github.com/TEIC/TEI-Simple/issues/8#issuecomment-85575868.
Sebastian Rahtz
Director (Research) of Academic IT
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
Não sou nada.
Nunca serei nada.
Não posso querer ser nada.
À parte isso, tenho em mim todos os sonhos do mundo.
This came about because an XPath expression may contain a comma (I think) so I was thinking about how to parse the function to extract out the 2 XPath expressions for alternate(xpath,xpath)
ah. I see where you are going how. and I just met a similar problem. I just wrote behaviour="break('page',if (@n) then @n else @facs)" and it doesn't look right at all.
I am beginning to think we should change this spec to say that the XPath expression should be passed as a string, i.e. surrounded by quotes. Doesn't help with how to pass quotes, but does deal with the embedded comma.
I suggest elements should be used instead of attributes (for behaviour and predicate). Otherwise I think this is going to be a source of endless pain. EDIT: on the other hand, if this stuff will typically be implemented in XSLT etc then perhaps it makes sense to use attributes, so that encoders are forced to write XPath expressions in a way that will work in XSLT, however awkward it may make certain expressions.
Since this is XPath 2, we have the codepoints-to-string() function, but it's not pretty.
"concat(codepoints-to-string(34), 'Hello', codepoints-to-string(34), ' said the policeman')"
It's a fair point, Conal. I don't want to change horses mid-race when the problem right now is checking functionality is there, but after we have a stable 1.0 using attributes, it would be a good idea to reconsider the choice of using attributes rather then element children.
I've compared the TEI Simple dtd with the DTA schema. Simple is more generous than DTA, but DTA has the following elements that Simple does not allow for:
addName country foreName genName nameLink orgName persName roleName surname
Should we include them? I can see three different arguments in favour of doing so. First, DTA has been adopted by CLARIN as its base format. Other things being equal, there is a benefit if a text in that format validates under Simple.
Second, and perhaps more substantively, named entity extraction seems to be the chief, and often the only, thing that people are interested in when they work with texts.
Third, when I showed Simple to the Perseus folks, they were very interested in the processing model but objected to the exclusion of the name elements.
On the minus side, you can just use type attributes for sub specification of names, and Simple may run the risk of no longer being simple. Do we want to slide down that slippery slope?
I think we quite consciously have made the decision of excluding 'syntactic
sugar' options for types and subtypes of names, all for the sake of leaving
the editor with precisely one way of encoding things.
To accommodate DTA and other corpora we provided a conversion piece from
'general TEI' to 'Simple TEI' that converts all
the naming thing is hard. we can put back all the specific ones, but then we'd have to remove the generic @type version. would that actually be better? i.e. not to support
the conversion stylesheet is now in the TEI Stylesheets
There's no specification on how the
behaviour
attribute's value should parsed. How should strings, URIs and XPath expressions should be quoted.