Closed praschke closed 3 months ago
after doing some further research, i've found that ID attributes can contain colons in XML 1.0, SVG 1.0, and HTML5, and that the reason people advise against it is that IDs containing colons have to be escaped in CSS selectors. I haven't found any other reason to avoid colons.
this is fortunate as changing the IDs would require many more changes to downstream tools, compared to removing the identical and uninformative DTD embedded in thousands of files. (almost 8MB!)
The current KanjiVG SVG is already fully compliant with the SVG 1.0 standard.
it is not valid if embedded DTDs are not parsed, which is now common security practice.
i have renamed it to be less offensive.
I am facing the same issue, I can't parse the current SVG files, and the changes made by this MR allows me to parse them (I am using an XML parser: https://erratique.ch/software/xmlm/doc/Xmlm/index.html).
I think it would be cool to have this MR to make the SVG files parsed more easily by many tools out there.
I am not sure about the cons: Maybe it is hard to make sure that there were no other changes than the one suggested in this PR description? If so, it could be helpful to have some commands (I guess sed
command?) so that we can replay the changes locally and check we get the same output.
the two commands i used were, in the kanji
directory:
fd . -e svg -x perl -i -0pe 's$ \[\n<!ATTLIST g\nxmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net"\nkvg:element CDATA #IMPLIED\nkvg:variant CDATA #IMPLIED\nkvg:partial CDATA #IMPLIED\nkvg:original CDATA #IMPLIED\nkvg:part CDATA #IMPLIED\nkvg:number CDATA #IMPLIED\nkvg:tradForm CDATA #IMPLIED\nkvg:radicalForm CDATA #IMPLIED\nkvg:position CDATA #IMPLIED\nkvg:radical CDATA #IMPLIED\nkvg:phon CDATA #IMPLIED >\n<!ATTLIST path\nxmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net"\nkvg:type CDATA #IMPLIED >\n\]$$g' {}
fd . -e svg -x sed -i 's$xmlns="http://www.w3.org/2000/svg"$xmlns="http://www.w3.org/2000/svg" xmlns:kvg="http://kanjivg.tagaini.net"$' {}
you can rewrite it to use find
instead of fd
if you do not have fd
installed. it is not very difficult to verify that these were the only changes made in the large commit.
I understand that some people are having trouble with the KanjiVG files with particular software applications. However, as I have repeatedly pointed out, according to the W3C SVG validator, the KanjiVG files are valid SVG 1.0, and it is easy and permitted for users to alter the files using, for example, your script, with the exception of the copyright notice, which should not be altered.
I am facing the same issue, I can't parse the current SVG files, and the changes made by this MR allows me to parse them (I am using an XML parser: https://erratique.ch/software/xmlm/doc/Xmlm/index.html).
I'm not prepared to go to this web page and try to download and install and then try out the XML parser that you are using. If there is a specific error from the XML parser that you are using then please inform what it is. Ideally this should be done under the "issues" page rather than as a comment on this pull request. The current KanjiVG files, as far as I know, are valid SVG 1.0.
I think it would be cool to have this MR to make the SVG files parsed more easily by many tools out there.
The one bug report about the problem with parsing the current files was so vague and non-specific and verbose as to be completely useless. The person behind that travesty seemed to think that I was much more interested in reading his endlessly-edited comments across several different websites than I am. If there is a problem with the current format of the files, please report that as a specific issue. The current KanjiVG is SVG 1.0 and I brought up the topic of upgrading to SVG 1.1 on the mailing list, but this resulted in yet more of the useless verbiage and nothing which I can take action on, let alone a consensus to do something.
I am not sure about the cons: Maybe it is hard to make sure that there were no other changes than the one suggested in this PR description? If so, it could be helpful to have some commands (I guess
sed
command?) so that we can replay the changes locally and check we get the same output.
The cons in this case are that we end up making unnecessary changes to a large number of files on the say-so of random people on the internet who cannot back up what they are saying with documentation in the SVG standards as to how the current files are incorrect.
After checking with the W3C validator here: https://validator.w3.org/check , it seems this MR produces invalid SVG 1.0 standard files. Maybe, one should just consider adding the attribute: xmlns:kvg="http://kanjivg.tagaini.net"
in the SVG tag and not removing the ATTLIST
tag?
The difference with adding this attribute is that it makes the file XML compliant according to this XML validator: https://jsonformatter.org/xml-validator
@benkasminbullock i explained the issue in my earlier comments on this PR, and corrected the first reporter's misinformation regarding the validity of colons in ids, which would be a much more disruptive change.
you may be correct that this is not an issue of strict compliance, but embedded DTDs are now commonly not parsed for security and performance reasons. because of this, these files will now fail to validate in many parsers. this is a usability issue.
if you are worried about breaking unknown projects that depend on the embedded DTD for parsing, then would a PR that only moves the namespace declarations out of the DTD and into the root element be acceptable?
@francoisthire the w3c validator requires files to be valid under specific recognized schemas with no allowance for extensibility. unrecognized namespaces are incorrectly treated as invalid.
This is the result of trying to parse the SVG in KanjiVG using the SVG 1.1 standard. The suggested fix in this pull request is not adequate in my opinion. We should either change the entire repository to SVG 1.1 or at least offer a release with SVG 1.1 versions of the files.
Unfortunately my comment above is mistaken, this is nothing to do with the SVG 1.1 standard. The ATTLIST and the kvg: prefix information are perfectly valid SVG 1.1. Removing the ATTLIST and the kvg: prefixed information on the groups and strokes is not an acceptable solution to the failures of people's XML parsing software. The only thing which I could do to help would be to offer a release of KanjiVG with the attributes removed for the use of people working with these kinds of XML parsers.
Closes #451.