Closed simon-20 closed 8 months ago
This turned out not to be a problem with the Validator, but a bug in curl
. curl
has a bug which means it doesn't POST
files with CR line endings in them properly, unless the --data-binary
flag is used. Without that flag, curl
will truncate the file in ways which usually mean it stops being valid XML, and this is why the Validator was returning 400
. (CR
line endings are valid in text files, including XML files: https://www.w3.org/TR/REC-xml/#sec-line-ends).
Brief Description The Validator gives an error code for any file that contains a CR character as a line feed. This occurs for files which use CR exclusively as the line feeds, and also for files that have mixed line feeds.
XML which contain a single CR character as the newline are, it would seem, valid, because using a single CR character was what old Macs used to do. The XML specification says that XML parsers must normalise line feeds to a single LF character. So, this bug would seem to suggest that the XML parsers being used are not standards compliant.
The files are initially processed with
xmllint
, andxmllint
seems to handle things correctly.However, the
libxmljs2
library throws an error for files containing single CR line feeds.The exact error returned depends on where the single CR line feed comes in the file. As a result, the HTTP error status returned by the Validator when encountering these files also varies. Mostly commonly it is
400
, but sometimes it is422
.This problem came to light because many/most of UNICEF's files have both
CRLF
and singleCR
newline sequences in them.Severity Critical
Issue Location The problem code is in
validatorServices.js
, but it is thelibxmljs2
library used by this code that is the root cause of the problem.Steps to Reproduce Get a valid IATI XML file, and alter it so that at least one of the new line sequences is a single
CR
character. Then post it to the Validator.You will likely see a
400
error.