Since there's just a requirement that we read 0 or more non-hyphen characters (or rather, 0 or more sets of characters which do not comprise two consecutive hyphens) followed by two hyphens and a close bracket.
However, the SGML parser doesn't seem to work this way - instead, when parsing a document like
<!----><foo></foo>
it seems to think the comment is not closed:
?- open_string("<!----><foo></foo>", S), load_structure(S, Term, []).
ERROR: SGML2PL(sgml): []:1: Unexpected end-of-file in comment
S = $stream_reference('<stream>(0x7fa9cab6c5a0)'),
Term = [].
This can be corrected pretty easily if we go from S_CMTO directly to S_CMT after consuming a hyphen. Currently we go into some intermediate state S_CMT1, which then immediately moves into S_CMT after consuming (any) character - I'm not sure if there's a motivation for that or if it's just a bug.
Based on my reading of https://www.w3.org/TR/xml/#sec-comments it would seem that this is a valid (but empty) comment:
Since there's just a requirement that we read 0 or more non-hyphen characters (or rather, 0 or more sets of characters which do not comprise two consecutive hyphens) followed by two hyphens and a close bracket.
However, the SGML parser doesn't seem to work this way - instead, when parsing a document like
it seems to think the comment is not closed:
This can be corrected pretty easily if we go from S_CMTO directly to S_CMT after consuming a hyphen. Currently we go into some intermediate state S_CMT1, which then immediately moves into S_CMT after consuming (any) character - I'm not sure if there's a motivation for that or if it's just a bug.
I'll provide a pull request shortly