IBM-Cloud / satellite-mvi-lab

Apache License 2.0
3 stars 4 forks source link

Feedback on MVI on Sat README #11

Open lionelmace opened 1 year ago

lionelmace commented 1 year ago

I just started my first MVI installation and wanted to share some feedback.

rhocheck commented 1 year ago

Why this All IAM policy? this does not make sense

It did not work without it. We didn't have the time to try which policy exactly was missing.

lionelmace commented 1 year ago
./sat-deploy.sh env apply -e env_id="${ENV_ID}" -e IBM_ODF_API_KEY="${IBM_ODF_API_KEY}" -e OCP_PULL_SECRET='${OCP_PULL_SECRET}' -v
Extra parameters (0): env_id=xy-mvi5
Extra parameters (1): IBM_ODF_API_KEY=XXXX
Extra parameters (2): OCP_PULL_SECRET=${OCP_PULL_SECRET}
Error: statfs /Users/lionelmace/git/satellite-mvi-lab/data/status/sample: no such file or directory
Error: specify at least one container name or ID to log
Error: accepts 1 arg(s), received 0
  1. Restart the podman machine
    podman machine stop
    podman machine start
lionelmace commented 1 year ago
rhocheck commented 1 year ago

[ ] When saying This step is done by the automation., please be more precise. Does that mean that the script sat-deploy is doing the installation of ODF?

That means that the script sat-deploy launches a container and an ansible role installs ODF. The framework is described in detail on the referenced project: https://ibm.github.io/cloud-pak-deployer/

lionelmace commented 1 year ago

@rhocheck I put myself in the shoes of someone that is not aware of what Cloud Pak Deployer is doing. Step 4 and 5 could be merged in Step 3 which would have a brief introduction that look like the following: This step will automate the installation of Satellite, OpenShift Cluster, ODF (OpenShift Data Foundation) and OpenShift Registry.

lionelmace commented 1 year ago
Error: [ERROR] Instance (02d7_09e9a14e-80de-4934-8c54-200a8a5ecb88) went into failed state during the operation 
 ([
    {
        "code": "cannot_start_capacity",
        "message": "Can't start instance because resource capacity is unavailable.",
        "more_info": "https://cloud.ibm.com/docs/vpc?topic=vpc-instance-status-messages#cannot-start-capacity"
    }
]) 
 [WARNING] Running terraform apply again will remove the tainted instance and attempt to create the instance again replacing the previous configuration

  with ibm_is_instance.xy_mvi5_gpu,
  on sat_host_xy-mvi5-gpu.tf line 29, in resource "ibm_is_instance" "xy_mvi5_gpu":
  29: resource "ibm_is_instance" "xy_mvi5_gpu" {

Solution

  1. Edit the file data/config/sample/config/sat-ibm-cloud-roks.yaml
  2. Replace the zone 3 by zone 2 in the subnet line
    - name: {{ env_id }}-gpu
    flavour: gpu
    sat_zone_idx: 3
    infrastructure:
    image: {{ env_id }}-rhcos410
    subnet: {{ env_id }}-subnet-zone-2
    bastion_host: {{ env_id }}-bastion
    keys:
    - "{{ env_id }}-provision"
rhocheck commented 1 year ago

@rhocheck I put myself in the shoes of someone that is not aware of what Cloud Pak Deployer is doing. Step 4 and 5 could be merged in Step 3 which would have a brief introduction that look like the following: This step will automate the installation of Satellite, OpenShift Cluster, ODF (OpenShift Data Foundation) and OpenShift Registry.

The whole idea of this lab is to get familiar with deployment automation. Before participants start doing anythig we would talk about this.

rhocheck commented 1 year ago

step 4 and step 5 are essential tasks when deploying IBM Cloud Satellite. We have automated those tasks because a million things can go wrong which are hard to correct. Nevertheless is it very important to understand what has beed done under the hood.