COMCIFS / cif_core

The IUCr CIF core dictionary
15 stars 9 forks source link

CIF2.0 line termination? #444

Closed rowlesmr closed 1 year ago

rowlesmr commented 1 year ago

I've been writing up my own parser for a little edification, and have a question about the line termination scheme.

Table 1 of [1] says that the line terminator is ?U+000D?, [?U+000A?] | ?U+000A?. By my understanding, [ ] represent something optional, | is an 'or', and , is concatenation.

AFAIK, this can be represented in a regex as (\r\n?)|\n.

I think that means that recognised line terminators are : \n, \r\n, and \r.

Is that correct?

[1] https://scripts.iucr.org/cgi-bin/paper?S1600576715021871

vaitkus commented 1 year ago

You are correct on the overall interpretation of the grammar and on the allowed line-endings.

Please close the issue if you find that your question has been sufficiently answered.