cloudera / cloudera-scripts-for-log4j

Scripts for addressing log4j zero day security issue
Apache License 2.0
86 stars 68 forks source link

No validation on backup files - roll back not possible #27

Open starkjs opened 2 years ago

starkjs commented 2 years ago

There is no validation on backup files. I have a case where the backup path filled up from the script and a number of jar files didn't get backed up, but did get modified. This means there is no rollback - very bad

if [ ! -f "$targetbackup" ]; then
  echo "Backing up to '$targetbackup'"
  cp -f "$jarfile" "$targetbackup"
fi
jtran-cloudera commented 2 years ago

Thanks for the report. We are looking into a fix.

starkjs commented 2 years ago

No worries @jtran-cloudera

I will submit a PR today, I have a number of fixes. My clients raised cases with cloudera too. So it’s in the notes for those cases

starkjs commented 2 years ago

Hi @jtran-cloudera, I am not able to send you my code via public as it's IP. I have feed back the code changes via our Cloudera Consultant and he will pass it back via the Cloudera Case my client has open. Thanks Josh

starkjs commented 2 years ago

Hi @jtran-cloudera, @sdevineni, I see you added the code to validate the backup file, but it's only on jar files, it's also needed on every backup file, like the tar.gz, nar and the new uberjar code.

I see you also added https://github.com/cloudera/cloudera-scripts-for-log4j/blob/ce8dfbe6e2a2e899306726acd5767668e2b24d23/cm_cdp_cdh_log4j_jndi_removal.sh#L119 when the code doesn't match the backup, I think that is a bad idea, as it will exit the entire script at that point.

Thanks Josh

sunilgovind commented 2 years ago

Yes, we are working to update this for nar files as well.

if backup fails, it could be because of permissions or space elated issues. hence a fail-fast methodology is adopted to figure our the reason behind the backup creation.

starkjs commented 2 years ago

Hi @sunilgovind,

Sounds good. I have already added the sha checksum to the tar.gz and nar too

I disagree, from the point of view of automation, I don't want the script to die, it should report issues, not action in those cases and move on. When you have to work on 100's and 1000's of servers to run the patch, you don't have time to stop and debug on Production. All testing needs to be done in NonProd and get all the issue sorted before running in Production

Thanks Josh