Closed afernandezody closed 3 years ago
Hi @afernandezody thanks for reaching out.
I believe the quotes you are using in the tag parameters are not the right ones: " vs “
it should be
tags = {"Grafana" : "true"}
Please fix this a let me know if this solves the error.
Thanks
That doesn't make any difference.
Can you please try to completely remove the tags parameter?
Hi @nicolaven, Somehow, I cannot even log in on the master instance after removing the tag. I had to roll it back and couldn't check any logs. In addition to this issue, there's something else that has caught my attention. The postscript file uses post_install_args, which is set to https://github.com/aws-samples/aws-parallelcluster-monitoring/tarball/main,aws-parallelcluster-monitoring,install-monitoring.sh. However, there is no 'tarball' subdirectory as everything looks uncompressed (or maybe the post-install script takes care of this but it doesn't look like that to me). Thanks.
Hi @afernandezody
if you go to this URL https://github.com/aws-samples/aws-parallelcluster-monitoring/tarball/main
with your browser you can see it is actually downloading a tar.gz file. The post-install script is basically downloading that file and un-tar it.
Regarding the tags, I would suggest trying to recreate a new cluster using this configuration file
[global]
update_check = true
sanity_check = true
cluster_template = w1cluster
[aws]
aws_region_name = us-east-1
aws_access_key_id = ***
aws_secret_access_key = ***
[cluster w1cluster]
vpc_settings = odyvpc
placement_group = DYNAMIC
placement = compute
key_name = llave_i3
master_instance_type = t3.micro
compute_instance_type = c5.large
cluster_type = spot
disable_hyperthreading = true
initial_queue_size = 2
max_queue_size = 2
maintain_initial_size = true
scheduler = slurm
base_os = alinux2
post_install = https://raw.githubusercontent.com/aws-samples/aws-parallelcluster-monitoring/main/post-install.sh
post_install_args = https://github.com/aws-samples/aws-parallelcluster-monitoring/tarball/main,aws-parallelcluster-monitoring,install-monitoring.sh
additional_iam_policies = arn:aws:iam::aws:policy/CloudWatchFullAccess,arn:aws:iam::aws:policy/AWSPriceListServiceFullAccess,arn:aws:iam::aws:policy/AmazonSSMFullAccess,arn:aws:iam::aws:policy/AWSCloudFormationReadOnlyAccess
tags = {"Grafana" : "true"}
[vpc odyvpc]
master_subnet_id = ***
vpc_id = ***
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
I would not have figured it out in a million years. The cluster launched and I was able to open the launch screen on the browser (although I didn't know the grafana password). However, my main issue now is that trying CentOS8 as the OS results in the compute instances being created and terminated in an apparently endless loop. (I had read the comment that only alinux2 has been tested). I went over all the files but didn't find anything outstanding. The only thing that crossed my mind was if somehow the variable 'cfn_cluster_user' is not being gathered correctly. Any thoughts on why it doesn't work with CentOS8. Thanks.
Yes, I confirm that this monitoring dashboard has only been tested with AL2.
I'd suggest to have a look at the installation logs here: /tmp/monitoring-setup.log
and try to figure out whats wrong with Centos8. Most likely it is the installation of the components, here: https://github.com/aws-samples/aws-parallelcluster-monitoring/blob/main/parallelcluster-setup/install-monitoring.sh
Feel free to send a PR with the modification needed.
Thanks
Any progress? do you need help?
Hi @nicolaven,
It's working for both CentOS7 & 8. The only thing that I haven't tested is with p3 (or other GPU) compute instances but should be no problem as it's only a minor correction.
Best.
Closing as the PR fixes my original problem, but let me know if the PR needs any fixing.
Hello, Maybe there is something wrong in my script
because it complains about validation as soon as it starts creating CloudWatchLogsSubstack.
Thanks.