Closed sebastic closed 3 years ago
It does seem unfortunate that netcdf is using an xml parser that hasn't been updated in over 15 years.
I found nothing that came close in size -- 1 .c and 1 .h files -- and with a usable license.
I went thru this list again to see what was new: https://awesomeopensource.com/projects/xml-parser Possible replacements: xtmlractor, xml
Why not just link to libxml2? Don't include an XML library in your source tree.
And and yet another burden to our users? There is a burden when depending on other libraries. Some are positive since we do not have to maintain them, Some are negative -- our users have to install it and we have to keep up with API changes. For something as trivial as an XML parser, I personally do not think the burden of libxml is worth it; it is overkill for our limited use.
I would argue that there is nothing particularly simple about parsing XML, particularly in a safe manner. Also, packaging/building libraries is what distributions and sysadmins are meant to do and is not rocket science. It's the 21st century, let's take advantage of it.
Sorry, and I do not mean to be rude. But my experience is that despite what we wish were true, the reality is different. Maintaining a system across multiple platforms: at least Linux, BSD, OSX, Windows, Cygwin, Mingw and various supercomputers is a difficult task. It is unfortunate but true that not very many libraries work across that many platforms. Libxml might be one of those, I do not know.
My recent experience trying to support the aws-sdk-cpp library across just Linux and Windows is a good example. It is overkill, but I am forced to use it. After much pain, I finally got it to build on a specific version of Ubuntu, but not on earlier versions of Ubuntu. I still do not have it working on Windows. God knows what will happen with other versions of Linux, or OSX.
It would be nice if the C eco-system were as clean as Python or Java, but it is not. With respect to your comment: "...It's the 21st century, let's take advantage of it." I wish the C eco-system was in the 21st century but it is not. Unidata netcdf-c needs to be very careful about external dependencies.
libxml2 is everywhere where GNOME is, this includes Linux, BSD, OSX, and Windows:
http://xmlsoft.org/downloads.html
It's a good example of a cross-platform library.
Your experience says more about the (lack of) portability of aws-sdk-cpp and means nothing with respect to libxml2.
With respect to dependencies, hdf5 is a very problematic one. It does not support building both parallel and serial versions at the same time, and required far too much customization in its Debian package. There is also no public VCS nor bugtracker, bitbucket.hdfgroup.org & jira.hdfgroup.org are recent improvements but are still not good enough.
<thinking out loud>
The primary issue here is that the bundled XML interface has a number of CVE's, and is very out of date.
The ideal solution would be to link against an external interface, libxml2
. On *nix systems with modern package management, installing dependencies when installing a package becomes essentially a non-issue; [package manager] install libnetcdf
would ensure libxml2
were installed along with everything else.
The confounding factor, as usual, is Windows. The link provided above, thanks @sebastic, leads me through a couple levels of links and ultimately takes me to 32-bit Windows versions from 2011, and 64-bit versions from 2015. Not the solution we are looking for.
On Windows, we provide binary installers for netCDF which bundle libcurl
, libz
, libhdf5
, etc, because otherwise the burden of installing the appropriate dependencies becomes non-trivial for the end user. And while we are often able to walk non-developers through the fairly rote process of installing from source on *nix systems, I really shudder to imagine trying to do the same on Windows.
So the situation, in summary, is as follows:
libxml2
as part of our process on Windows, similar to how we bundle libhdf5
, etc? I don't have an answer to this yet, but I will investigate. If we can, great, we can continue the discussion about what that would take from our side. If we can't, it becomes a much tricker issue to link against libxml2
. xtmlractor
or xml
leap out at you?I think we are all on the same page when it comes to the urgency to address these CVEs. The issue, as always, is how do we solve this in a way that doesn't leave our Windows/Visual Studio users in a tough spot.
<\thinking out loud>
Sorry to get my back up over this. I will switch to libxml if you want. But I just cannot see that it is worth it for the limited use we make of a read-only XML parser.
@DennisHeimbigner No issue, I was just outlining my thoughts above while I was looking at the issue. The burden of dependencies on our users is a big one, and the lack of any sort of universal package manager for binary installation makes it a particularly sticky issue for Windows. Thanks for getting the issues as outlined resolved!
THis issue can be closed
I'm not sure about closing this. I found several issues:
exit(-1)
when the test succeeds. This is most likely not desired when run from within netcdf()strspn()
and strcspn()
are returning the desired offset or the offset to the end of the string. In many cases it is just assumed that the string will continue if more data is expected. With all this said, how much do these issues matter in the context of netcdf-c? So far, this code is only used in the context of DAP4 to transmit information from a dap4 server to a client using netcdf-c. It can be expected that the XML data passed down from a DAP4 server is valid when talking to a legitimate server. In this scenario most of the reported issues seem to be irrelevant. As long as the DAP4 server is confined to a secured private network, no issues are to be expected - however, there are DAP4 servers available on the 'public' internet. If these networks provide authentication (https) and the user has authentication enabled (libcurl default), the risk of a spoofed address or man-in-the-middle may be low therefore, most of the issues won't be relevant. However, some servers may not offer https, or may use a self-signed certificate which may not be installed on the client system, thus, users may be tempted to disable authentication for convenience. The missing result checks on [re/m]alloc() are different: the missing checks will cause the program to fail in some random way without signalling an out-of-memory condition back to the user. This is not a problem that is limited to large data provided thru an insecure connection, this problem can be caused by improperly set application limits as well. How much this is an issue depends on how much effort is done in the rest of the code to handle this situation. One may ask how relevant is this and how much damage may occur. I don't feel competent to answer this question and therefore would rather error on the safe side. With all this in mind, it should be considered if the use of a well-curated XML library should not be offered as a build time option at least.
Thanks, I should have thought of that. Just because I do not want to depend on libxml2 does mean I should disallow its use if available.
Thanks, I should have thought of that. Just because I do not want to depend on libxml2 does mean I should disallow its use if available.
I've looked into this a bit over the weekend. It seems to be doable and libxml2 looks like a good option: it is a DOM parser like ezxml. Other parsers, like xpat, are streaming parsers using callbacks which would make an integration extremely difficult. I cannot promise when it will be done as I'm only able to work on it on the side.
Don't bother. I already have start on this.
It seems that libxml2 is falling away from current maintenance. https://gitlab.gnome.org/GNOME/libxml2/-/issues/319
I started #2169 to consider the choice for the default XML parser for netcdf. Currently this is tinyxml2 versus libxml2. Commenters, if you have any insights related to this choice, please add to #2169. Thank you.
As reported by Moritz Muehlenhoff in Debian Bug #989360: