aws-samples / 1click-hpc

Deploy your HPC Cluster on AWS in 20min. with just 1-Click.
MIT No Attribution
62 stars 44 forks source link

multiple 1click-hpc clusters with the same FSx will crash enginframe #27

Open rvencu opened 2 years ago

rvencu commented 2 years ago

Since the /fsx/nice location is not unique to the cluster, starting multiple clusters with the same fsx will overwrite the portal data for older clusters

nicolaven commented 2 years ago

yes, right. we could consider using something like /fsx/nice/{instanceID}/ instead. what do you think ?

rvencu commented 2 years ago

Sounds reasonable

Obțineți Outlook pentru iOShttps://aka.ms/o0ukef


De la: Nicola Venuti @.> Trimis: Thursday, July 7, 2022 4:46:54 PM Către: aws-samples/1click-hpc @.> Cc: Richard Vencu @.>; Author @.> Subiect: Re: [aws-samples/1click-hpc] multiple 1click-hpc clusters with the same FSx will crash enginframe (Issue #27)

yes, right. we could consider using something like /fsx/nice/{instanceID}/ instead. what do you think ?

— Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/1click-hpc/issues/27#issuecomment-1177642263, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACGFM3UEVCMPEP4XE4XSZLVS3NU5ANCNFSM52HNURIA. You are receiving this because you authored the thread.Message ID: @.***>

rvencu commented 2 years ago

or what about cluster name that has to be unique anyway?

"export CLUSTER_NAME=${AWS::StackName}"

nicolaven commented 2 years ago

yes this is an option because the module that install EF backup an existing EF installation. So you are fine deleting an old cluster a creating a new one with the same name mounting the same FSx.

rvencu commented 2 years ago

on the post.install.sh, line 62 export NICE_ROOT=$(jq --arg default "${SHARED_FS_DIR}/nice/${stack_name}" -r '.post_install.enginframe | if has("nice_root") then .nice_root else $default end' "${dna_json}")

and on 10.install.enginframe.headnode.sh line 60

    if [[ -d "${SHARED_FS_DIR}/nice/${stack_name}" ]]; then
        mv  -f "${SHARED_FS_DIR}/nice/${stack_name}" "${SHARED_FS_DIR}/nice/${stack_name}.$(date "+%d-%m-%Y-%H-%M").BAK"
    fi

then multiple clusters can live side by side with the enginframe portals intact

nicolaven commented 2 years ago

yep! can you sent a PR specifically with this modification so I can incorporante