ULHPC / puppet-slurm

A Puppet module designed to configure and manage SLURM(see https://slurm.schedmd.com/), an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters
Apache License 2.0
19 stars 24 forks source link

Prevent improper use of SchedMD resources #14

Closed bluemage650 closed 5 years ago

bluemage650 commented 6 years ago

So, funny story, I just got an email from SchedMD saying that our site had downloaded the same archive 86,000 times. Oops.

I've hardcoded a local mirror (https://github.com/richardcq/puppet-slurm/commit/28206b3b6849ab43c97fc9fd26cad7a342e86bd1 -- not a very smart fix, but it works) because it doesn't seem like there's an option to do so anywhere else.

I think the issue was that I set the src_checksum => '' in the slurm class, causing it to always download the new version? If that's the case, maybe it would be good to pull out the download logic into its own resource and prevent blanking the checksum if the download URL isn't manually specified?

Falkor commented 6 years ago

Hum that's weird as the source download is relying on the official puppet-archive module -- see manifests/download.pp. In particular, if the tarball remains and the checksum is correct, it does not download again.

Could it be that the checksum was wrong ?

bluemage650 commented 6 years ago

Well, I specified the checksum as the empty string, so it would be always wrong, right? I have no other explanation for this behavior as it would have otherwise taken about 2000 full fleet reimages to get the same traffic. Will test against the local mirror and get you more details on Monday.

Falkor commented 6 years ago

Then that could be an explaination -- you should precise it (with the expected value you can find from https://www.schedmd.com/downloads.php) to avoid continuous download. Altrenatively, if you don't plan to use it, you should probably set the checksum_verify to false. Then I assume that the archive is not downloaded if the file is present.

bluemage650 commented 6 years ago

checksum_verify on the puppet-archive resource should be set to false if the checksum value is blank according to the logic you linked to, which is why I am confused.

On Sun, Nov 4, 2018, 4:35 PM Sebastien Varrette <notifications@github.com wrote:

Then that could be an explaination -- you should precise it (with the expected value you can find from https://www.schedmd.com/downloads.php) to avoid continuous download. Altrenatively, if you don't plan to use it, you should probably set the checksum_verify to false. Then I assume that the archive is not downloaded if the file is present.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ULHPC/puppet-slurm/issues/14#issuecomment-435709946, or mute the thread https://github.com/notifications/unsubscribe-auth/AK-szQXiNmX-k0aB_NogmZ7V20roGmH5ks5ur12VgaJpZM4YHn99 .

Falkor commented 6 years ago

You're fully right, I'm tired ;) I would suggest you to set back the checksum to its expected value until other investigations are performed.

bluemage650 commented 6 years ago

I just checked the apache logs for my local mirror and it's definitely hammering the http server every 30 minutes (the Puppet polling time). From the puppet agent -vt run: Notice: /Stage[main]/Slurm::Install/Slurm::Download[17.11.12]/Archive[slurm-17.11.12.tar.bz2]/ensure: replace archive: /usr/local/src/slurm-17.11.12.tar.bz2 from (md5)94fb13b509d23fcf9733018d6c961ca9 to (md5)false $ md5sum /usr/local/src/slurm-17.11.12.tar.bz2 94fb13b509d23fcf9733018d6c961ca9 /usr/local/src/slurm-17.11.12.tar.bz2

bluemage650 commented 6 years ago
class {'slurm':
    with_slurmd => !$isMaster,
    with_slurmctld => $isMaster,
    with_slurmdbd => $isMaster,
    manage_accounting => true,
    clustername => '...',
    controlmachine => '...',
    uid => 500,
    gid => 500,
    munge_uid => 501,
    munge_gid => 501,
    version => '17.11.12',
    src_checksum => '',
    #https://github.com/ULHPC/puppet-slurm/issues/12
    nodes => $slurmNodes,
    partitions => $slurmPartitions,
    munge_key_content => base64("decode", lookup("vault_munge_key"))
  }
Falkor commented 6 years ago

can you check is setting checksum_type parameter to none (it is md5 by default) prevent the forced replacement?

Falkor commented 5 years ago

I close this issue for the moment. Reopen it if you still experience the issue or if the suggested check above is not successful.

bluemage650 commented 5 years ago

Sorry about the delay, this completely slipped my mind. Just to verify that this is your suggested solution:

  class {'slurm':
    ...
    version => '17.11.9-2',
    src_checksum => '',
    checksum_type => 'none',
   ...
  }

If so, that does not work.

Notice: /Stage[main]/Slurm::Install/Slurm::Download[17.11.12]/Archive[slurm-17.11.12.tar.bz2]/ensure: replace archive: /usr/local/src/slurm-17.11.12.tar.bz2 from (md5)94fb13b509d23fcf9733018d6c961ca9 to (md5)false

@Falkor I can not reopen it since I'm not a contributor