Closed stephentalley closed 1 month ago
OK, but it makes little to no sense to encode these characters as entities in the XML output, because the general consensus is to avoid them at all cost in filenames (pathnames) to ensure there are no issues with tools and portability problems. Some tools and some OS forbid them in pathnames. As such, I never deemed it necessary to handle these in XML as special.
See also: https://en.wikipedia.org/wiki/Filename
Users don't always control the names of the files they need to grep.
But more importantly, if it is a legal file name for the file system, then it should probably be supported by the tool.
IMO only the &
and the quote "
should be escaped in XML attributes. XML is forgiving when <
and >
are used in attributes, which rarely if ever leads to interoperability issues as there aren't any tags in XML attributes.
OK, I've added new fields %i
and %I
in the upcoming ugrep release to output pathnames in XML.
The --xml
format will be output with %I
instead of %H
as follows:
--format-begin='<grep>%~' --format-open=' <file%["]$%[ name="]I>%~' --format=' <match%["]$%[ line="]N%[ column="]K%[ offset="]B>%X</match>%~%u' --format-close=' </file>%~' --format-end='</grep>%~'
When ugrep creates XML output, file names with XML special characters cause the XML to be invalid:
These characters should be escaped when part of the file name:
This, for example, would be valid XML:
Not sure if there are other attributes other than
name
that should be considered as well?Thanks again for your work on this tool!