BFO-ontology / BFO-2020

A repository for BFO 2020 artifacts specified in ISO 21838-2:2020
68 stars 27 forks source link

Use of the '@' symbol in Common Logic files causes Hets to throw a parsing error #60

Open dillerm opened 1 year ago

dillerm commented 1 year ago

The cl:comment in the first few lines of each of these files contains an email address, which of course uses an '@' symbol. Whenever I try to load this file in the online Hets toolkit (rest.hets.eu), I get the following error message: "unexpected '@' / expecting ' ' '. [i.e., expecting a single quote]" (comment in brackets is my own).

I'm still not absolutely certain why this is an issue, but looking at the CLIF specification I noticed that the '@' symbol is not listed in Section A.2.2.4 under the characters that can be used to form lexical tokens (see attached). Because (1) this email address is part of a quoted string, (2) quoted strings are considered lexical tokens in CLIF, and (3) lexical tokens can only contain members of the sets of characters, delimiters, or whitespace that are defined in the specification, I believe this is why Hets is throwing this error.

Solution: Replacing the '@' symbol with '(at)' or something along those lines fixes this. Please note that, to my knowledge, you unfortunately cannot escape it with a backslash because the backslash is only reserved for special uses in CLIF, which is to escape single or double quotes within quoted strings.

Screen Shot 2023-03-27 at 6 42 47 PM
alanruttenberg commented 1 year ago

This looks like a spec bug. It says: "This includes all the alphanumeric characters", but then that disagrees with the production. Who wins? It can't be an intentional omission.

alanruttenberg commented 1 year ago

Not that it's a better option, but you can use any Unicode by escaping with \u or \U. Has HETS been updated for the 2018 Common Logic spec? If not there might be other problems. cl:outdiscourse is defined in 2018 but not 2007. Looks like cl:ttl is also new.

alanruttenberg commented 1 year ago

I changed my source to use (at) in the future. If you want to submit a PR fixing the current files, that's welcome. Otherwise I'll get to it at some point.

alanruttenberg commented 1 year ago

It was pointed out to me that @ isn't an alphanumeric character. But the sentence starts "char is all the remaining ASCII non-control characters", so that includes @

dillerm commented 1 year ago

@alanruttenberg , yeah, I find it very bizarre as well and thought it might be have been omitted by mistake. I might reach out to the Hets folks to see if this is a feature or a bug on either their end or the spec's. I can also make the pull request tomorrow.

dillerm commented 1 year ago

Oh, it looks like I misread the spec and \u, like you said, can be used to escape any Unicode.