alpha-asp / Alpha

A lazy-grounding Answer-Set Programming system
BSD 2-Clause "Simplified" License
58 stars 10 forks source link

cli: Upgrade Log4j #317

Closed lorenzleutgeb closed 2 years ago

lorenzleutgeb commented 2 years ago

CVE-2021-44228 is making news. AFAIK Alpha is not vulnerable, and, as a CLI application, also not really a target, but let's update Log4j anyway.

codecov[bot] commented 2 years ago

Codecov Report

Merging #317 (7593853) into master (21d074a) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #317   +/-   ##
=========================================
  Coverage     69.51%   69.51%           
  Complexity     2092     2092           
=========================================
  Files           182      182           
  Lines          8006     8006           
  Branches       1416     1416           
=========================================
  Hits           5565     5565           
  Misses         2086     2086           
  Partials        355      355           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 21d074a...7593853. Read the comment docs.

AntoniusW commented 2 years ago

Thank you for this patch, I think it is the first step in the right direction but should not be the last one.

CVE-2021-44228 is making news. AFAIK Alpha is not vulnerable, and, as a CLI application, also not really a target, but let's update Log4j anyway.

I have not tried to build a POC but am pretty sure that Alpha is vulnerable. The problem is that Log4j does variable substitution on the final logging string. So if we have Logger.info("Something happened on the {} atom.", at); and the user supplies the solver with a fact p("just a nice string constant containing ${jndi:ldap://.../a} stuff"). and manages to trigger Alpha to just log the atom (e.g., by triggering an oops()), then you have your exploit.

Regarding the first step, what else we have to do: make sure that no library used by Alpha uses an old log4j implementation. @lorenzleutgeb is there an easy way to check we don't build/ship an older log4j implementation transitively included via dependencies?

lorenzleutgeb commented 2 years ago

Regarding the first step, what else we have to do: make sure that no library used by Alpha uses an old log4j implementation. @lorenzleutgeb is there an easy way to check we don't build/ship an older log4j implementation transitively included via dependencies?

$ for PROJECT in api cli-app commons core solver \
  do \
    gradle --quiet alpha-${PROJECT}:dependencyInsight \
      --configuration runtimeClasspath \
      --dependency "org.apache.logging.log4j:log4j-core" \
  done
No dependencies matching given input were found in configuration ':alpha-api:runtimeClasspath'
org.apache.logging.log4j:log4j-core:2.15.0
   variant "runtime" [
      org.gradle.status              = release (not requested)
      org.gradle.usage               = java-runtime
      org.gradle.libraryelements     = jar
      org.gradle.category            = library

      Requested attributes not found in the selected variant:
         org.gradle.dependency.bundling = external
         org.gradle.jvm.environment     = standard-jvm
         org.gradle.jvm.version         = 8
   ]

org.apache.logging.log4j:log4j-core:2.15.0
\--- org.apache.logging.log4j:log4j-slf4j-impl:2.15.0
     \--- runtimeClasspath

No dependencies matching given input were found in configuration ':alpha-commons:runtimeClasspath'
No dependencies matching given input were found in configuration ':alpha-core:runtimeClasspath'
No dependencies matching given input were found in configuration ':alpha-solver:runtimeClasspath'

Generally, there should be just one implementation of Log4j on the classpath for any application. The only application project in this repository is alpha-cli-app, and it is the only one for which there's Log4j on the runtime classpath.

AntoniusW commented 2 years ago

@lorenzleutgeb Thanks for the extra check! Let's hope that this was the only way to trigger variable substitution and lookup across servers deep inside the logging frameworks. :)

AntoniusW commented 2 years ago

This seems to be a gift that keeps on giving, log4j 2.16.0 is here, fixing some more substitution/lookup loopholes: https://logging.apache.org/log4j/2.x/security.html

Shall we rip out log4j right now, or are we gonna play this game for several iterations?

lorenzleutgeb commented 2 years ago

Well, it's very simple to bump the version, and Log4j still is a very good logging library, so let's just go with it.

AntoniusW commented 2 years ago

The main problem I see here is that those RCE's are not just some bugs now becoming apparent, but actually Log4j simply working as intended/specified. The philosophy behind this very useful logging framework seems to be deficient. Hence, I suspect that there are many more points where no one saw a problem with executing code coming from another server, without ensuring that the received code is actually what one wants.

madmike200590 commented 2 years ago

While I do see your point on this being a design flaw (the whole lookup feature of log4j enabling this seems pretty weird to me and sounds like asking for trouble..), I'd like to point out that Alpha is not at risk from this:

lorenzleutgeb commented 2 years ago

I only replied to parts of https://github.com/alpha-asp/Alpha/pull/317#issuecomment-992762087 in https://github.com/alpha-asp/Alpha/pull/317#issuecomment-992840785 because within the scope of Alpha, our reactions so far IMO are timely, effective and sufficient. The Log4j vulnerabilities draw a lot of precious attention. And if Log4j decides to release a handful new versions, so be it. I'll PR the upgrade.

As for ecosystem-wide stopgaps as suggested in

Regarding the first step, what else we have to do: make sure that no library used by Alpha uses an old log4j implementation.

This is just out of scope for us, as @madmike200590 correctly argues.

These measures make more sense in build systems, such as Gradle, see https://github.com/gradle/gradle/commit/96f40395de12c2550d0f5a578ce8392846d45b91. And it's good when distributions update quickly, see e.g. NixOS (as I happen to maintain Gradle on NixOS) https://github.com/NixOS/nixpkgs/pull/150866

Let's also update Alpha to Gradle 7.3.2 (PR at https://github.com/alpha-asp/Alpha/pull/321).

The main problem I see here is that those RCE's [sic] are not just some bugs now becoming apparent, but actually Log4j simply working as intended/specified. The philosophy behind this very useful logging framework seems to be deficient. Hence, I suspect that there are many more points where no one saw a problem with executing code coming from another server, without ensuring that the received code is actually what one wants.

TBH I don't know the history behind the RCE, and why these kinds of substitutions were enabled by default, but I'd venture the guess that this was more oversight than conscious decision (happy to be educated here, if you have any references). So yes, the spec is wrong, but that's human error. Happens.

To steer this in a more constructive direction, if you really consider "ripping out" Log4j, please suggest an alternative that you would be more comfortable with. We still want to log things, don't we?

AntoniusW commented 2 years ago

Thank you both for this discussion, I heavily disagree with some arguments and fully agree with some others, and I especially appreciate the different perspectives.

@madmike200590:

While I do see your point on this being a design flaw (the whole lookup feature of log4j enabling this seems pretty weird to me and sounds like asking for trouble..), I'd like to point out that Alpha is not at risk from this:

To fix the design flaw, I only see two approaches for log4j, both are likely not fully possible. First, one could simply disable almost all variable substitutions (or just allow constant ones like ${java_version}), which probably causes many problems with all kinds of usages of log4j. Second, they could really fix the design by introducing a separation of formatString and rawData, where rawData never gets evaluated/substituted in any way. This looks to me like the sane way to move forward, but it shifts the burden of separating data from format to each and every user of log4j. I would be very surprised if they could actually do that.

* While your exploit sketch described above would definitely work, it would be useless - Since it's a command-line application that is run on infrastructure you control yourself, you can use the exploit to execute arbitrary code on your own machine, but nothing else (I found [this explanation](https://www.youtube.com/watch?v=0-abhd-CLwQ) very helpful in gaining a better understanding of the CVE).

I strongly disagree, because there is a non-zero likelihood that someone decides to run Alpha in the back-end of their web-facing service and interfaces with the cli-app instead of an Alpha object from within Java. If the rest of the web-facing service is not written in Java, I think it is highly likely that one rather uses the cli-app instead of switching their project to Java. In such a case, we just delivered an RCE opportunity.

* Any server-based application running Alpha as a component of a larger application would not use the `alpha-cli-app` module, but depend on `alpha-solver`, i.e. it would supply its own logging implementation, since every non-executable component of Alpha only uses slf4j, but does not specify any specific implementation. Vulnerability of such applications is therefore outside the scope of our influence since we cannot force other applications to use specific logging frameworks.

Good point, very good point. Alpha outside of the cli-app uses slf4j and its implementation is decided by those using Alpha. I am fine with that.

@lorenzleutgeb :

I only replied to parts of #317 (comment) in #317 (comment) because within the scope of Alpha, our reactions so far IMO are timely, effective and sufficient. The Log4j vulnerabilities draw a lot of precious attention. And if Log4j decides to release a handful new versions, so be it. I'll PR the upgrade.

Thank you for the work on this front! I also think that our reaction was timely, though I guess there will be more issues arising from logging in the next weeks and months.

As for ecosystem-wide stopgaps as suggested in

Regarding the first step, what else we have to do: make sure that no library used by Alpha uses an old log4j implementation.

This is just out of scope for us, as @madmike200590 correctly argues.

These measures make more sense in build systems, such as Gradle, see gradle/gradle@96f4039. And it's good when distributions update quickly, see e.g. NixOS (as I happen to maintain Gradle on NixOS) NixOS/nixpkgs#150866

Let's also update Alpha to Gradle 7.3.2 (PR at #321).

I fully agree, this is a good step in the right direction. Probably not the final solution yet, but goes in the right direction.

The main problem I see here is that those RCE's [sic] are not just some bugs now becoming apparent, but actually Log4j simply working as intended/specified. The philosophy behind this very useful logging framework seems to be deficient. Hence, I suspect that there are many more points where no one saw a problem with executing code coming from another server, without ensuring that the received code is actually what one wants.

TBH I don't know the history behind the RCE, and why these kinds of substitutions were enabled by default, but I'd venture the guess that this was more oversight than conscious decision (happy to be educated here, if you have any references). So yes, the spec is wrong, but that's human error. Happens.

To me this does not look like an oversight, but more like a general problem. In log4j there are many ways of doing lookups (see https://logging.apache.org/log4j/2.x/manual/lookups.html ) and evaluations on the log message are done repeatedly allowing many functions to be combined. For example to avoid an easy detection of the ${jndi:ldap:...} pattern, exploits use strings like ${${::-j}${::-n}${::-d}${::-I}... which evaluate to the above jndi after several rounds of log-message evaluations. Similarly, ldap is not the only way to get data out or in. Exploits use rmi and dns, too (see https://news.sophos.com/en-us/2021/12/12/log4shell-hell-anatomy-of-an-exploit-outbreak/ ). Some people rant that this is actually deeply rooted in Java, for example this commentary in German, read its second page https://www.heise.de/meinung/Kommentar-zu-log4j-Es-funktioniert-wie-spezifiziert-6294476.html . I agree with this analysis, not the ranting, but the analysis on page 2.

To steer this in a more constructive direction, if you really consider "ripping out" Log4j, please suggest an alternative that you would be more comfortable with. We still want to log things, don't we?

I am seriously considering whether a simple write-to-stderr is the right choice for the cli-app. Since it is a cli application, the caller should know what they want to do with stuff printed on stderr. Normal output goes to stdout, so it can be separated from the logging. Apparently SLF4J offers a simple logger that does exactly that. What are your thoughts on replacing log4j with SLF4J-simple? (Though I have not checked that in more detail.)

lorenzleutgeb commented 2 years ago

While I do see your point on this being a design flaw (the whole lookup feature of log4j enabling this seems pretty weird to me and sounds like asking for trouble..), I'd like to point out that Alpha is not at risk from this:

To fix the design flaw, I only see two approaches for log4j, both are likely not fully possible. First, one could simply disable almost all variable substitutions (or just allow constant ones like ${java_version}), which probably causes many problems with all kinds of usages of log4j. Second, they could really fix the design by introducing a separation of formatString and rawData, where rawData never gets evaluated/substituted in any way. This looks to me like the sane way to move forward, but it shifts the burden of separating data from format to each and every user of log4j. I would be very surprised if they could actually do that.

From what I understand looking at this diff they removed lookups completely.

I strongly disagree, because there is a non-zero likelihood that someone decides to run Alpha in the back-end of their web-facing service and interfaces with the cli-app instead of an Alpha object from within Java. If the rest of the web-facing service is not written in Java, I think it is highly likely that one rather uses the cli-app instead of switching their project to Java. In such a case, we just delivered an RCE opportunity.

I don't think we can be blamed neither for depending on one of the most popular logging library for Java, nor for not finding and fixing that vulnerability ourselves. Again, I think that upgrading to a non-vulnerable version would be a reasonable demand, and we have already done that.

The main problem I see here is that those RCE's [sic] are not just some bugs now becoming apparent, but actually Log4j simply working as intended/specified. The philosophy behind this very useful logging framework seems to be deficient. Hence, I suspect that there are many more points where no one saw a problem with executing code coming from another server, without ensuring that the received code is actually what one wants.

TBH I don't know the history behind the RCE, and why these kinds of substitutions were enabled by default, but I'd venture the guess that this was more oversight than conscious decision (happy to be educated here, if you have any references). So yes, the spec is wrong, but that's human error. Happens.

To me this does not look like an oversight, but more like a general problem. In log4j there are many ways of doing lookups (see https://logging.apache.org/log4j/2.x/manual/lookups.html ) and evaluations on the log message are done repeatedly allowing many functions to be combined. For example to avoid an easy detection of the ${jndi:ldap:...} pattern, exploits use strings like ${${::-j}${::-n}${::-d}${::-I}... which evaluate to the above jndi after several rounds of log-message evaluations. Similarly, ldap is not the only way to get data out or in. Exploits use rmi and dns, too (see https://news.sophos.com/en-us/2021/12/12/log4shell-hell-anatomy-of-an-exploit-outbreak/ ). Some people rant that this is actually deeply rooted in Java, for example this commentary in German, read its second page https://www.heise.de/meinung/Kommentar-zu-log4j-Es-funktioniert-wie-spezifiziert-6294476.html . I agree with this analysis, not the ranting, but the analysis on page 2.

Es ist fuer mich sehr schwierig diesen Artikel ernst zu nehmen. Nach Absaetzen wie

Köhntopp:

Doch was ist JNDI? Jindi al Dap ist der Name eines alten arabischen Philosophen und Mathematik-Pioniers, der für Sun/Oracle gearbeitet hat, um ein System von Directory Lookups in Java zu entwickeln. Dieses System lädt irgendwie Code aus dem Internet nach. Aber selbst, wenn man länger auf das Systemdiagramm starrt, erkennt man nicht unbedingt sofort, an welcher Stelle sich der Java CLASSPATH so erweitert, dass er das gesamte Internet umfasst:

gleich zu Beginn ist jeder Vertrauensvorschuss und die meiste Kredibilitaet verbraucht. Das ist unserioes.

Köhntopp:

Aber im Ernst: Viele behandeln den log4j-Exploit wie einen Bug, einen Programmierfehler, eine Verletzung einer Spezifikation. Genau das ist jedoch nicht der Fall: Es funktioniert – wortwörtlich – endlich einmal alles wie spezifiziert und dokumentiert: All die Modularität und dynamische Erweiterbarkeit von Java hat ganz wunderbar und genau wie geplant zusammengearbeitet und funktioniert.

Mir war klar, dass hier das Problem in der Spezifikation liegt.

Darauf haben wir dekadenlang hingearbeitet! "NOTABUG, WONTFIX"

Diese Darstellung ist falsch. Die Entwickler von Log4j haben die Sicherheitsluecke ernst genommen, und getrackt (siehe LOG4J-3208).

Köhntopp:

Aber so ist es. Weniger Code ist besserer Code. Am besten so wenig Code, dass man ihn zur Gänze und mit all seinen Interaktionen verstehen kann. Und auch den Code, der notwendig ist, um den eigenen Code zu betreiben.

Das liest sich reisserisch, ist aber ein sehr komplexes Problem. Ich habe keine Zeit und Lust das aktuell noch weiter auszurollen...

To steer this in a more constructive direction, if you really consider "ripping out" Log4j, please suggest an alternative that you would be more comfortable with. We still want to log things, don't we?

I am seriously considering whether a simple write-to-stderr is the right choice for the cli-app. Since it is a cli application, the caller should know what they want to do with stuff printed on stderr. Normal output goes to stdout, so it can be separated from the logging. Apparently SLF4J offers a simple logger that does exactly that. What are your thoughts on replacing log4j with SLF4J-simple? (Though I have not checked that in more detail.)

See https://github.com/alpha-asp/Alpha/pull/323

AntoniusW commented 2 years ago

From what I understand looking at this diff they removed lookups completely.

Indeed it looks like that, wow. Let's hope this finally removes all issues of that kind.

... Some people rant that this is actually deeply rooted in Java, for example this commentary in German, read its second page https://www.heise.de/meinung/Kommentar-zu-log4j-Es-funktioniert-wie-spezifiziert-6294476.html . I agree with this analysis, not the ranting, but the analysis on page 2.

Es ist fuer mich sehr schwierig diesen Artikel ernst zu nehmen. Nach Absaetzen wie

Köhntopp: ... Das liest sich reisserisch, ist aber ein sehr komplexes Problem. Ich habe keine Zeit und Lust das aktuell noch weiter auszurollen...

I am sorry, my intention was not to anger or annoy anyone with a reference to the ranting, I basically just ignored that ranting part.

To steer this in a more constructive direction ...

See #323

I didn't expect that to be possible with changes in just 2 lines of code, nice! Thank you! :-)