Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
516 stars 262 forks source link

Multiple security issues in ezxml #2119

Closed sebastic closed 3 years ago

sebastic commented 3 years ago

As reported by Moritz Muehlenhoff in Debian Bug #989360:

Multiple security issues were found in ezxml, which netcdf bundles:

CVE-2021-31598: https://sourceforge.net/p/ezxml/bugs/28/

CVE-2021-31348 / CVE-2021-31347: https://sourceforge.net/p/ezxml/bugs/27/

CVE-2021-31229: https://sourceforge.net/p/ezxml/bugs/26/

CVE-2021-30485: https://sourceforge.net/p/ezxml/bugs/25/

CVE-2021-26222: https://sourceforge.net/p/ezxml/bugs/22/

CVE-2021-26221: https://sourceforge.net/p/ezxml/bugs/21/

CVE-2021-26220: https://sourceforge.net/p/ezxml/bugs/23/

CVE-2019-20202: https://sourceforge.net/p/ezxml/bugs/17

CVE-2019-20201 https://sourceforge.net/p/ezxml/bugs/16

CVE-2019-20200: https://sourceforge.net/p/ezxml/bugs/19

CVE-2019-20199: https://sourceforge.net/p/ezxml/bugs/18

CVE-2019-20198: https://sourceforge.net/p/ezxml/bugs/20

CVE-2019-20007: https://sourceforge.net/p/ezxml/bugs/13

CVE-2019-20006: https://sourceforge.net/p/ezxml/bugs/15

CVE-2019-20005: https://sourceforge.net/p/ezxml/bugs/14

opoplawski commented 3 years ago

It does seem unfortunate that netcdf is using an xml parser that hasn't been updated in over 15 years.

DennisHeimbigner commented 3 years ago

I found nothing that came close in size -- 1 .c and 1 .h files -- and with a usable license.

DennisHeimbigner commented 3 years ago

I went thru this list again to see what was new: https://awesomeopensource.com/projects/xml-parser Possible replacements: xtmlractor, xml

sebastic commented 3 years ago

Why not just link to libxml2? Don't include an XML library in your source tree.

DennisHeimbigner commented 3 years ago

And and yet another burden to our users? There is a burden when depending on other libraries. Some are positive since we do not have to maintain them, Some are negative -- our users have to install it and we have to keep up with API changes. For something as trivial as an XML parser, I personally do not think the burden of libxml is worth it; it is overkill for our limited use.

opoplawski commented 3 years ago

I would argue that there is nothing particularly simple about parsing XML, particularly in a safe manner. Also, packaging/building libraries is what distributions and sysadmins are meant to do and is not rocket science. It's the 21st century, let's take advantage of it.

DennisHeimbigner commented 3 years ago

Sorry, and I do not mean to be rude. But my experience is that despite what we wish were true, the reality is different. Maintaining a system across multiple platforms: at least Linux, BSD, OSX, Windows, Cygwin, Mingw and various supercomputers is a difficult task. It is unfortunate but true that not very many libraries work across that many platforms. Libxml might be one of those, I do not know.

My recent experience trying to support the aws-sdk-cpp library across just Linux and Windows is a good example. It is overkill, but I am forced to use it. After much pain, I finally got it to build on a specific version of Ubuntu, but not on earlier versions of Ubuntu. I still do not have it working on Windows. God knows what will happen with other versions of Linux, or OSX.

It would be nice if the C eco-system were as clean as Python or Java, but it is not. With respect to your comment: "...It's the 21st century, let's take advantage of it." I wish the C eco-system was in the 21st century but it is not. Unidata netcdf-c needs to be very careful about external dependencies.

sebastic commented 3 years ago

libxml2 is everywhere where GNOME is, this includes Linux, BSD, OSX, and Windows:

http://xmlsoft.org/downloads.html

It's a good example of a cross-platform library.

Your experience says more about the (lack of) portability of aws-sdk-cpp and means nothing with respect to libxml2.

sebastic commented 3 years ago

With respect to dependencies, hdf5 is a very problematic one. It does not support building both parallel and serial versions at the same time, and required far too much customization in its Debian package. There is also no public VCS nor bugtracker, bitbucket.hdfgroup.org & jira.hdfgroup.org are recent improvements but are still not good enough.

WardF commented 3 years ago

<thinking out loud>

The primary issue here is that the bundled XML interface has a number of CVE's, and is very out of date.

The ideal solution would be to link against an external interface, libxml2. On *nix systems with modern package management, installing dependencies when installing a package becomes essentially a non-issue; [package manager] install libnetcdf would ensure libxml2 were installed along with everything else.

The confounding factor, as usual, is Windows. The link provided above, thanks @sebastic, leads me through a couple levels of links and ultimately takes me to 32-bit Windows versions from 2011, and 64-bit versions from 2015. Not the solution we are looking for.

On Windows, we provide binary installers for netCDF which bundle libcurl, libz, libhdf5, etc, because otherwise the burden of installing the appropriate dependencies becomes non-trivial for the end user. And while we are often able to walk non-developers through the fairly rote process of installing from source on *nix systems, I really shudder to imagine trying to do the same on Windows.

So the situation, in summary, is as follows:

  1. There are CVEs in the existing bundled parser; this needs to be addressed.
  2. Can we easily compile libxml2 as part of our process on Windows, similar to how we bundle libhdf5, etc? I don't have an answer to this yet, but I will investigate. If we can, great, we can continue the discussion about what that would take from our side. If we can't, it becomes a much tricker issue to link against libxml2.
  3. In the short term, can we switch to one of the more modern parsers that @DennisHeimbigner suggested? This may be the path of least resistance to solve point 1, while evaluating point 2. @DennisHeimbigner, did either xtmlractor or xml leap out at you?

I think we are all on the same page when it comes to the urgency to address these CVEs. The issue, as always, is how do we solve this in a way that doesn't leave our Windows/Visual Studio users in a tough spot.

<\thinking out loud>

DennisHeimbigner commented 3 years ago
  1. I already fixed the ezxml bugs as part of an upcoming PR.
  2. For our purposes, libxml is overkill; we only need very simple xml parsing.
  3. xmltractor and xml both are still pretty old (2017?); no idea what bugs they contain.
DennisHeimbigner commented 3 years ago

Sorry to get my back up over this. I will switch to libxml if you want. But I just cannot see that it is worth it for the limited use we make of a read-only XML parser.

WardF commented 3 years ago

@DennisHeimbigner No issue, I was just outlining my thoughts above while I was looking at the issue. The burden of dependencies on our users is a big one, and the lack of any sort of universal package manager for binary installation makes it a particularly sticky issue for Windows. Thanks for getting the issues as outlined resolved!

DennisHeimbigner commented 3 years ago

THis issue can be closed

e4t commented 3 years ago

I'm not sure about closing this. I found several issues:

  1. The upstream patches only cover three of the reported issues - namely
    • CVE-2021-31347 & CVE-2021-31348 - bug@27
    • CVE-2021-31229 - bug#26
    • CVE-2021-30485 - bug#25
  2. The fix for CVE-2021-31598 is bogus, it causes most of the well-formed and valid XML data I fed to it to to fail, moreover it calls exit(-1) when the test succeeds. This is most likely not desired when run from within netcdf()
  3. The remaining CVEs are not addressed at all: To fix them, I went ahead and performed a deeper analysis of the ezXML code:
    • The EzXML code is in a PoC state. It assumes that:
      • XML data passed to it is always valid - from this it makes assumptions of the data to follow often without checking. This will lead to out-of-bound memory accesses.
      • memory allocation never fails.
      • Moreover, it seems the code was written with the goal to minimize LOCs. To achieve this, it frequently uses nested assignments without checking the return values of the innermost function calls. This nesting also makes the code harder to read and understand.
      • Most inner functions do not have a way to report back failure and pass this information all the way up to the caller.
      • CVE-2019-20201 bug#16, CVE-2019-20198 bug#20, CVE-2021-31229 bug#26 all have the same root cause, however, the affected code is not used in netcdf-c. These are therefore irrelevant.
      • CVE-2019-20006 bug#15, CVE-2019-20202 bug#17, CVE-2021-31598 bug#28 all share the same root cause.
      • CVE-2021-31229 bug#26 points to an entire class of problems arising from not checking whether strspn() and strcspn() are returning the desired offset or the offset to the end of the string. In many cases it is just assumed that the string will continue if more data is expected.
      • CVE-2021-26221 bug#21, CVE-2021-26222 bug#22 and CVE-2021-26220 bug#23 represent the missing tests for failed memory (re)allocations mentioned above.
      • CVE-2019-20005 bug#14 cannot be reproduced with the reproducer provided.
      • I've fixed the issues not already addressed (see my comments in the sourceforge tickets above as well ass pull request #2133) and ported these fixes to netcdf-c. So far, for the classes of problems listed above these fixes cover the reproducers supplied in the bugs, they do not yet address all other occurrences.

With all this said, how much do these issues matter in the context of netcdf-c? So far, this code is only used in the context of DAP4 to transmit information from a dap4 server to a client using netcdf-c. It can be expected that the XML data passed down from a DAP4 server is valid when talking to a legitimate server. In this scenario most of the reported issues seem to be irrelevant. As long as the DAP4 server is confined to a secured private network, no issues are to be expected - however, there are DAP4 servers available on the 'public' internet. If these networks provide authentication (https) and the user has authentication enabled (libcurl default), the risk of a spoofed address or man-in-the-middle may be low therefore, most of the issues won't be relevant. However, some servers may not offer https, or may use a self-signed certificate which may not be installed on the client system, thus, users may be tempted to disable authentication for convenience. The missing result checks on [re/m]alloc() are different: the missing checks will cause the program to fail in some random way without signalling an out-of-memory condition back to the user. This is not a problem that is limited to large data provided thru an insecure connection, this problem can be caused by improperly set application limits as well. How much this is an issue depends on how much effort is done in the rest of the code to handle this situation. One may ask how relevant is this and how much damage may occur. I don't feel competent to answer this question and therefore would rather error on the safe side. With all this in mind, it should be considered if the use of a well-curated XML library should not be offered as a build time option at least.

DennisHeimbigner commented 3 years ago

Thanks, I should have thought of that. Just because I do not want to depend on libxml2 does mean I should disallow its use if available.

e4t commented 3 years ago

Thanks, I should have thought of that. Just because I do not want to depend on libxml2 does mean I should disallow its use if available.

I've looked into this a bit over the weekend. It seems to be doable and libxml2 looks like a good option: it is a DOM parser like ezxml. Other parsers, like xpat, are streaming parsers using callbacks which would make an integration extremely difficult. I cannot promise when it will be done as I'm only able to work on it on the side.

DennisHeimbigner commented 3 years ago

Don't bother. I already have start on this.

Dave-Allured commented 2 years ago

It seems that libxml2 is falling away from current maintenance. https://gitlab.gnome.org/GNOME/libxml2/-/issues/319

I started #2169 to consider the choice for the default XML parser for netcdf. Currently this is tinyxml2 versus libxml2. Commenters, if you have any insights related to this choice, please add to #2169. Thank you.