COMCIFS / cif_core

The IUCr CIF core dictionary
14 stars 9 forks source link

CIF2 grammar and `triple-quoted-string` and `quoted-string` #453

Closed rowlesmr closed 11 months ago

rowlesmr commented 11 months ago

I've been putting together a CIF parser and I have another question. I may also be totally off-base, as I'm translating to a PEG, and it does have different characteristics to ENBF.

I think the triple-quoted-strings should be tested before quoted-strings, as opposed to the current ENBF grammar:

https://github.com/COMCIFS/cif_core/blob/02b65fa614a4d087a428caa0e3e96892a6c50b8b/CIF2-EBNF.txt#L183-L187 and https://github.com/COMCIFS/cif_core/blob/02b65fa614a4d087a428caa0e3e96892a6c50b8b/CIF2-EBNF.txt#L164-L165

quoted-string and triple-quoted-string are defined as: (just taking "-delimited quotes to make things simpler, but the same argument applies to ')

quoted-string =  quote-delim, quote-content, quote-delim ;
quote-content = { char - quote-delim } ;
quote-delim = '"' ;

triple-quoted-string = quote3-delim, quote3-content, quote3-delim  ;
quote3-delim = '"""' ;
quote3-content = { [ '"', [ '"' ] ], not-quote, { not-quote } } ;
not-quote = allchars - '"' ;

Valid quoted-strings are "hello", "", "hi 'there' you" Valid triple-quoted-strings are """hello""", """""", """hi 'there' you \n"how" \n are ""you""?"""

If I try to parse """hello""" as a quoted-string first, as per nospace-value and table-entry, it spits out "", "hello", and "", so where I'm expecting to get a single string, I've got three: triple-quoted-strings should be tested for first.

or is this just how ENBF works...?

vaitkus commented 11 months ago

Unlike in PEG, the order of alternatives in EBNF is unimportant, therefore:

 nospace-value = 
     quoted-string 
   | triple-quoted-string 
   | list 
   | table ; 

is equivalent to

 nospace-value =
   triple-quoted-string 
   | quoted-string 
   | list 
   | table ; 

I guess in the case of your program in makes sense switching them up.

rowlesmr commented 11 months ago

the order of alternatives in EBNF is unimportant

Roger. PEG is an ordered choice; first match wins, even if there are other matches further down.

Now, on to figuring out why my text fields are broken.