Open epag opened 3 weeks ago
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T15:09:36Z
The release notes for 2.35 are here:
https://activemq.apache.org/components/artemis/download/release-notes-2.35.0
Looking over them,
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T15:14:52Z
Best guess is this one:
@ARTEMIS-4811 - Upgrade Netty to 4.1.111.Final@
I think we have some (possibly legacy) overrides in our @build.gradle@ to address (legacy-version) netty cves. You could try removing these:
// Because of CVE-2022-24823. TODO: remove when Artemis catches up
exclude group: 'io.netty'
TOGETHER WITH these:
// Because of CVE-2022-24823. TODO: remove when Artemis catches up
implementation group: 'io.netty', name: 'netty-all', version: '4.1.109.Final'
</code>
I'd be (edit: slightly) surprised if it's that simple, but you never know.
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T15:22:01Z
Thanks, James. I'll try to build the images in a few minutes.
Note that -dev03 migration to RHEL 8 is supposed to happen today, so I might be pulled aside by that work soon.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T15:55:49Z
There are other references to that CVE related to Redisson. Here is an example for the tasker:
1375 // Because of CVE-2022-24823. TODO: remove when Redisson catches up
1376 implementation group: 'io.netty', name: 'netty-all', version: '4.1.109.Final'
Do you propose that I remove all similar references, including for Redisson and Qpid, as well? Or just in the events broker where Artemis is referred to?
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T15:57:51Z
Oh, no, just Artemis. Each of these is to address the vuln in relation to a specific dependency that relies on the vulnerable netty. For Artemis, it looks like it is OBE, not the others.
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T16:05:03Z
Making the change and building/unit testing now. That change is to the wres-eventsbroker @project@, which I think will require a new .zip file be built for the core, wres-vis, and wres-writing. That means I'll have to manually rebuild the images for the worker, graphics, and writing containers, which will be fun.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T16:17:34Z
Something looks fishy in the build. Its been running for 12m+ with "2 tests completed, 2 failed", and repetitions of messages similar to this:
16:14:46.980 [Thread-15 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@52ca2652)] DEBUG org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl -- Message AMQPStandardMessage( [durable=false, messageID=396, address=BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true', size=284, scanningStatus=SCANNED, applicationProperties={}, messageAnnotations={x-opt-jms-dest=1, x-opt-jms-msg-type=3}, properties=Properties{messageId=ID:06aee21f-4ee8-474a-84e9-e64882e0ed7f:26:1:1-15, userId=null, to='BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true'', subject='null', replyTo='null', correlationId=W-tile5yGYYReUW3F9CQ--j9ahQ, contentType=application/octet-stream, contentEncoding=null, absoluteExpiryTime=null, creationTime=Tue Jun 25 16:14:46 UTC 2024, groupId='null', groupSequence=null, replyToGroupId='null'}, extraProperties = null] is not going anywhere as it didn't have a binding on address:BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true'
16:14:46.980 [Thread-15 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@52ca2652)] DEBUG org.apache.activemq.artemis.core.server.impl.ServerSessionImpl -- Routing result for AMQPStandardMessage( [durable=false, messageID=396, address=BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true', size=284, scanningStatus=SCANNED, applicationProperties={}, messageAnnotations={x-opt-jms-dest=1, x-opt-jms-msg-type=3}, properties=Properties{messageId=ID:06aee21f-4ee8-474a-84e9-e64882e0ed7f:26:1:1-15, userId=null, to='BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true'', subject='null', replyTo='null', correlationId=W-tile5yGYYReUW3F9CQ--j9ahQ, contentType=application/octet-stream, contentEncoding=null, absoluteExpiryTime=null, creationTime=Tue Jun 25 16:14:46 UTC 2024, groupId='null', groupSequence=null, replyToGroupId='null'}, extraProperties = null] = NO_BINDINGS
16:14:46.980 [Thread-15 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@52ca2652)] INFO org.apache.activemq.audit.message -- AMQ601500: User anonymous@127.0.0.1:33236 sent a message AMQPStandardMessage( [durable=false, messageID=396, address=BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true', size=284, scanningStatus=SCANNED, applicationProperties={}, messageAnnotations={x-opt-jms-dest=1, x-opt-jms-msg-type=3}, properties=Properties{messageId=ID:06aee21f-4ee8-474a-84e9-e64882e0ed7f:26:1:1-15, userId=null, to='BURL:topic://amq.topic/status?routingkey='status'&exclusive='false'&autodelete='true'', subject='null', replyTo='null', correlationId=W-tile5yGYYReUW3F9CQ--j9ahQ, contentType=application/octet-stream, contentEncoding=null, absoluteExpiryTime=null, creationTime=Tue Jun 25 16:14:46 UTC 2024, groupId='null', groupSequence=null, replyToGroupId='null'}, extraProperties = null], context: RoutingContextImpl(Address=null, routingType=null, PreviousAddress=null previousRoute:null, reusable=null, version=0)
..................................................
, transaction: null
I have lunch incoming and then the 1pm meeting. Not to mention RHEL 8 migration. I'm going to put this on-hold and leave it assigned to myself, but may need someone else to resolve this.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T16:18:54Z
By the way, it appears to be stuck within the EvaluationMessengerTest:
> :wres-events:test > Executing test wres.events.EvaluationMessagerTest
Maybe there is an artemis reference there that I didn't account for. I'll have to check later.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T17:55:46Z
Here is an update on the unit tests:
ore.server.impl.QueueImpl -- Scanning for expires on 8e936481-0a6c-4a3a-b170-4fc0d2f6dab0 done
<=====--------> 43% EXECUTING [1h 51m 49s]
> :wres-events:test > 20 tests completed, 12 failed
> :wres-events:test > Executing test wres.events.publish.MessagePublisherTest
I canceled the tests.
Looking for feedback... 2.34 appears to work out of the box with a single change to the @wres-eventsbroker/Dockerfile@. Should I push that change or should I wait for a solution to the 2.35 issue documented here.?*
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T18:01:14Z
When I made the changes James proposed in #131786-3, I also made changes to the Artemis version noted in the @build.gradle@. For example:
implementation('org.apache.activemq:artemis-server:2.35.0') {
</code>
So the @wres-eventsbroker@ uses 2.35. The @wres-events@ project then depends on that for JUnit testing:
// To create an embedded broker for unit testing
testImplementation project(':wres-eventsbroker')
</code>
It also includes usage of QPid which requires the older version of netty:
// Because of CVE-2022-24823. TODO: remove when Qpid catches up
implementation group: 'io.netty', name: 'netty-all', version: '4.1.109.Final'
</code>
So perhaps the 2.35 usage in @wres-eventsbroker@ is causing the same type of connection problem that I saw in staging when using 2.35 in the @Dockerfile@. Or perhaps there was a build conflict created when I modified @wres-eventsbroker@ without changing any other projects in @build.gradle@.
This is really outside my wheelhouse. Guess its good to branch out from time to time.
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T18:15:11Z
It may be the same issue. Unit tests use an in-memory broker, which depends on the broker version in the @build.gradle@. The service depends on the version in the dockerfile because the broker is instantiated separately. The important thing with upgrading the @build.gradle@ to 2.35 is that you update all instances and also delete those pairs of netty overrides (one to remove an old version, one to add back a later version) in each context where artemis appears. Beyond that, you won't have done anything wrong and the problem is probably the same underlying problem, and seemingly not related to netty (as I said, it would've been a lucky guess).
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T18:44:09Z
Got distracted by an email from Anna. She's looking for scatter plots of observed and predicted, or at least the data. I pointed her to the pairs for the data, but hope to find an existing ticket for the plots. Once I find that ticket (if it exists), I'll return to this and scan the @build.gradle@ carefully for any Artemis references I missed.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T19:26:24Z
"artemis" is only referred to in the @wres-eventsbroker@ @project@ in the @build.gradle@. The version 2.33 is referred to for the @server@ and the @protocol@; I updated both to 2.35. My modified @wres-eventsbroker@ @project@ looks like this:
project(':wres-eventsbroker') {
dependencies {
implementation 'org.slf4j:slf4j-api:2.0.13'
implementation 'org.apache.commons:commons-lang3:3.14.0'
// Jakarta Messaging API
implementation group: 'jakarta.jms', name: 'jakarta.jms-api', version: '3.1.0'
// Apache ActiveMQ Artemis server
implementation('org.apache.activemq:artemis-server:2.35.0') {
// TODO: remove when Artemis catches up
// Because we use slf4j, not jcl:
exclude group: 'commons-logging', module: 'commons-logging'
// Because we use slf4j, not jboss-logmanager:
exclude group: 'org.jboss.logmanager', module: 'jboss-logmanager'
// Because of CVE-2022-24823.
//exclude group: 'io.netty'
// Because of CVE-2023-2976, CVE-2020-8908.
exclude group: 'com.google.guava', module: 'guava'
// Because of CVE-2024-29131, CVE-2024-29133.
exclude group: 'org.apache.commons', module: 'commons-configuration2'
// Because of CVE-2024-34447, CVE-2024-29857, CVE-2024-30171, CVE-2024-30172.
exclude group: 'org.bouncycastle', module: 'bcprov-jdk18on'
}
// Include later dependency versions for the excluded dependencies above
implementation 'com.google.guava:guava:33.2.0-jre'
// Include later dependency version
implementation group: 'org.apache.commons', name: 'commons-configuration2', version: '2.10.1'
// Include later dependency version
implementation group: 'org.bouncycastle', name: 'bcprov-jdk18on', version: '1.78.1'
// AMQP 1.0 protocol support for the ActiveMQ Artemis server
implementation('org.apache.activemq:artemis-amqp-protocol:2.35.0') {
// Because we use slf4j, not jcl:
exclude group: 'commons-logging', module: 'commons-logging'
// Because we use slf4j, not jboss-logmanager:
exclude group: 'org.jboss.logmanager', module: 'jboss-logmanager'
// Because of CVE-2022-24823. TODO: remove when Artemis catches up
//exclude group: 'io.netty'
}
// Because of CVE-2022-24823. TODO: remove when Artemis catches up
//implementation group: 'io.netty', name: 'netty-all', version: '4.1.109.Final'
// JCIP annotations
compileOnly group: 'net.jcip', name: 'jcip-annotations', version: '1.0'
runtimeOnly('ch.qos.logback:logback-classic:1.5.6') {
// Not used at runtime, bloat
exclude group: 'edu.washington.cs.types.checker', module: 'checker-framework'
}
// Because artemis uses apache commons logging, bridge to slf4j
runtimeOnly 'org.slf4j:jcl-over-slf4j:2.1.0-alpha1'
// Qpid AMQP 1.0 Jakarta Messaging API client to test interaction with the embedded broker
testImplementation('org.apache.qpid:qpid-jms-client:2.5.0') {
// Because of CVE-2022-24823. TODO: remove when Qpid catches up
exclude group: 'io.netty'
}
// JUnit 5 API and runtime
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.10.2'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.10.2'
}
// Facilitate unit tests with JUnit
test {
useJUnitPlatform()
}
}
</code>
I made no other changes to the @build.gradle@. With that change, the unit tests for @wres-events@ runs for a very long time with failures reported.
Per James's comment, I think this is the manifestation of the same problem I see in staging using 2.35 in the @Dockerfile@. So I'm hopefully if we can fix the unit tests, we can fix the service.
I'm going to test moving to 2.34 to see if that allows for the unit tests to complete (it allows for staging to work).
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T19:29:32Z
Right. In this context, running for a very long time will mean running through the connection retries for each unit test, most likely. The failure of the core app after no word from a subscriber in PT5M is probably not in play for the unit tests, although I would need to check. Anyway, I predict the unit tests will pass with 2.34 and it will be the same issue, but a slightly different manifestation w/r to the timeouts.
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T19:31:28Z
The 2.34 change would be to only two lines in the @build.gradle@. Since the @netty@ change James spotted was in 2.35, I need to continue to exclude @io.netty@ and forcibly set the version used to 4.1.109.
I reverted my changes for 2.35, made the changes for 2.34 (again, only two lines), and ran the build. The unit tests are on-going. If they succeed, then I plan to push my change to @build.gradle@ and @wres-eventsbroker/Dockerfile@ so that we can deploy what we have using @dockerize.sh@ without issue. However, I don't know how long 2.34 will be available given 2.35 is already out there.
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T19:33:13Z
I am pretty sure you can leave as-is (overrides removed) because 2.34 uses netty 4.1.109.Final out of the box.
Original Redmine Comment Author Name: James (James) Original Date: 2024-06-25T19:33:55Z
Remember seeing that in the release notes for 2.34.
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T19:34:34Z
James wrote:
I am pretty sure you can leave as-is (overrides removed) because 2.34 uses netty 4.1.109.Final out of the box.
Okay. I'll try that when the current set of tests complete.
I hope to get the 2.34 changes pushed today, but time is running out.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T19:44:23Z
First set of unit tests pass.
I'm now removing the netty lines and testing again,
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-06-25T19:53:14Z
Tests passed. I've pushed the 2.34 changes in commit:1f09e46d8f73ce0bd5dc08c9197b8b2d93bebe3d. That should allow us to deploy the events broker until they decide to remove the 2.34 version.
Changes for 2.35 will have to come later, perhaps when we update dependencies for the next deployment.
Hank
Original Redmine Comment Author Name: Evan (Evan) Original Date: 2024-07-02T13:44:30Z
Not addressed in 6.25
Author Name: Hank (Hank) Original Redmine Issue: 131786, https://vlab.noaa.gov/redmine/issues/131786 Original Date: 2024-06-25 Original Assignee: Hank
See #131672. If I use Artemis 2.34, the service appears to run properly: I can bring it up, post smoke tests, and see successes. If I use 2.35, subscribers fail to negotiate; example exception summary:
2024-06-25T13:43:45.936+0000 ERROR EvaluationStatusTracker While tracking evaluation 9uGAGVwNp0RbPcbkbk-ETiB8p-4, encountered an error in a messaging client that prevented the evaluation from succeeding. The error message is: client_id: "CODjKPSqSD14nQy37vqUdAe_sL0" time { seconds: 1719323025 nanos: 931731891 } completion_status: EVALUATION_COMPLETE_REPORTED_FAILURE status_events { status_level: ERROR event_message: "class wres.events.subscribe.SubscriberTimedOutException: Failed to negotiate all subscriptions within the timeout period of PT5M. Subscribers could not be negotiated for the following output formats: [CSV2]" }
This ticket can be resolved once we have identified the issue and fixed it,
Hank
Redmine related issue(s): 131911