cloudstax / firecamp

Serverless Platform for the stateful services
https://www.cloudstax.io
Apache License 2.0
209 stars 20 forks source link

firecamp-service-cli ignores memory parameters #64

Closed ddt7 closed 6 years ago

ddt7 commented 6 years ago

running the firecamp.template logging to bastion host running firecamp-service-cli with -reserve-memory=768 still the service task definition has: "memoryReservation": 1024, And when testing vs t2.micro there isn't left 1024MB

JuniusLuo commented 6 years ago

Which service are you creating? Could you please post all parameters?

The reserve-memory could be overwritten for some services using JVM, such as Cassandra/Kafka/ZooKeeper/ElasticSearch. For example, Cassandra JVM heap Xms and Xmx are set to the same. If Cassandra JVM heap size is set to 1024MB, the reserved memory is set to 1024MB as well. This avoids the JVM memory get swapped out to disk, which will have big impact on JVM performance.

ddt7 commented 6 years ago

I am running cassandra as follows: ./firecamp-service-cli -region=us-east-1 -cluster=casdb -op=create-service -service-type=cassandra -service-name=t1 -replicas=3 -volume-size=5 -journal-volume-size=1 -max-memory=770 -reserve-memory=512 -cas-heap-size=256 -jmx-user=jmxuser -jmx-passwd=changeme

JuniusLuo commented 6 years ago

Thanks for posting the detail cli. I could not reproduce it. Could you please post the screen shot for the container definition?

256MB is too small for a 3 nodes Cassandra. The Cassandra container image includes jolokia agent for monitoring. The JVM for a 3 nodes Cassandra needs to be at least 768MB. You will need to use the t2.small instance for test.

ddt7 commented 6 years ago

for test i can give it 768MB here is the container defenition { "executionRoleArn": null, "containerDefinitions": [ { "dnsSearchDomains": null, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "casdb-t1-9a7d8bff89984c4d543654344b58b284", "awslogs-region": "us-east-1" } }, "entryPoint": null, "portMappings": [ { "hostPort": 7000, "protocol": "tcp", "containerPort": 7000 }, { "hostPort": 7001, "protocol": "tcp", "containerPort": 7001 }, { "hostPort": 7199, "protocol": "tcp", "containerPort": 7199 }, { "hostPort": 9042, "protocol": "tcp", "containerPort": 9042 }, { "hostPort": 9160, "protocol": "tcp", "containerPort": 9160 }, { "hostPort": 8778, "protocol": "tcp", "containerPort": 8778 } ], "command": null, "linuxParameters": null, "cpu": 256, "environment": [ { "name": "VERSION", "value": "latest" } ], "ulimits": null, "dnsServers": null, "mountPoints": [ { "readOnly": null, "containerPath": "/data", "sourceVolume": "9a7d8bff89984c4d543654344b58b284" }, { "readOnly": null, "containerPath": "/journal", "sourceVolume": "journal_9a7d8bff89984c4d543654344b58b284" } ], "workingDirectory": null, "dockerSecurityOptions": null, "memory": null, "memoryReservation": 1024, "volumesFrom": [], "image": "cloudstax/firecamp-cassandra:3.11", "disableNetworking": null, "healthCheck": null, "essential": true, "links": null, "hostname": null, "extraHosts": null, "user": null, "readonlyRootFilesystem": null, "dockerLabels": null, "privileged": false, "name": "casdb-t1-container" } ], "placementConstraints": [], "memory": null, "taskRoleArn": null, "compatibilities": [ "EC2" ], "taskDefinitionArn": "arn:aws:ecs:us-east-1:709846101695:task-definition/casdb-t1:1", "family": "casdb-t1", "requiresAttributes": [ { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" } ], "requiresCompatibilities": [], "networkMode": "host", "cpu": null, "revision": 1, "status": "ACTIVE", "volumes": [ { "name": "9a7d8bff89984c4d543654344b58b284", "host": { "sourcePath": "9a7d8bff89984c4d543654344b58b284" } }, { "name": "journal_9a7d8bff89984c4d543654344b58b284", "host": { "sourcePath": "journal_9a7d8bff89984c4d543654344b58b284" } } ] }

JuniusLuo commented 6 years ago

Looks you are using the latest release. Tried on my testbed.

./firecamp-service-cli -region=us-east-1 -cluster=t1 -op=create-service -service-type=cassandra -service-name=t1 -replicas=1 -volume-size=1 -journal-volume-size=1 -max-memory=770 -reserve-memory=512 -cas-heap-size=384 -jmx-user=jmxuser -jmx-passwd=changeme

Both memory and memoryReservation are set correctly. While, in your output, memory is null and memoryReservation is 1024.

{
  "executionRoleArn": null,
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "t1-t1-2c4c4536f9b042fd771e2d7788e3ad67",
          "awslogs-region": "us-east-1"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 7000,
          "protocol": "tcp",
          "containerPort": 7000
        },
        {
          "hostPort": 7001,
          "protocol": "tcp",
          "containerPort": 7001
        },
        {
          "hostPort": 7199,
          "protocol": "tcp",
          "containerPort": 7199
        },
        {
          "hostPort": 9042,
          "protocol": "tcp",
          "containerPort": 9042
        },
        {
          "hostPort": 9160,
          "protocol": "tcp",
          "containerPort": 9160
        },
        {
          "hostPort": 8778,
          "protocol": "tcp",
          "containerPort": 8778
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 256,
      "environment": [
        {
          "name": "VERSION",
          "value": "latest"
        }
      ],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [
        {
          "readOnly": null,
          "containerPath": "/data",
          "sourceVolume": "2c4c4536f9b042fd771e2d7788e3ad67"
        },
        {
          "readOnly": null,
          "containerPath": "/journal",
          "sourceVolume": "journal_2c4c4536f9b042fd771e2d7788e3ad67"
        }
      ],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": 770,
      "memoryReservation": 512,
      "volumesFrom": [],
      "image": "cloudstax/firecamp-cassandra:3.11",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": false,
      "name": "t1-t1-container"
    }
  ],
  "placementConstraints": [],
  "memory": null,
  "taskRoleArn": null,
  "compatibilities": [
    "EC2"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:497621646529:task-definition/t1-t1:1",
  "family": "t1-t1",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    }
  ],
  "requiresCompatibilities": [],
  "networkMode": "host",
  "cpu": null,
  "revision": 1,
  "status": "ACTIVE",
  "volumes": [
    {
      "name": "2c4c4536f9b042fd771e2d7788e3ad67",
      "host": {
        "sourcePath": "2c4c4536f9b042fd771e2d7788e3ad67"
      }
    },
    {
      "name": "journal_2c4c4536f9b042fd771e2d7788e3ad67",
      "host": {
        "sourcePath": "journal_2c4c4536f9b042fd771e2d7788e3ad67"
      }
    }
  ]
}
JuniusLuo commented 6 years ago

Not sure why your testbed has this weird behavior. Could you please check the system?

  1. on the worker node, run sudo docker plugin ls
  2. on the bastion node, delete the previous cli and get the latest cli again. https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/packages/firecamp-service-cli.tgz
  3. Try again using the latest cli.
ddt7 commented 6 years ago

I ran it again btw my script took it from the same s3 latest link, it was ok with memory BUT i now task creation fail

Status reason CannotCreateContainerError: API error (500): create 66b84f2254c64e2b7430388fe1201c2c: VolumeDriver.Create: Create, GetServiceAttr error DB RecordNotFound req {66b84f2254c64e2b7430388fe1201c2c map[]}

false

task definition { "executionRoleArn": null, "containerDefinitions": [ { "dnsSearchDomains": null, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "casdb-t1-66b84f2254c64e2b7430388fe1201c2c", "awslogs-region": "us-east-1" } }, "entryPoint": null, "portMappings": [ { "hostPort": 7000, "protocol": "tcp", "containerPort": 7000 }, { "hostPort": 7001, "protocol": "tcp", "containerPort": 7001 }, { "hostPort": 7199, "protocol": "tcp", "containerPort": 7199 }, { "hostPort": 9042, "protocol": "tcp", "containerPort": 9042 }, { "hostPort": 9160, "protocol": "tcp", "containerPort": 9160 }, { "hostPort": 8778, "protocol": "tcp", "containerPort": 8778 } ], "command": null, "linuxParameters": null, "cpu": 256, "environment": [ { "name": "VERSION", "value": "latest" } ], "ulimits": null, "dnsServers": null, "mountPoints": [ { "readOnly": null, "containerPath": "/data", "sourceVolume": "66b84f2254c64e2b7430388fe1201c2c" }, { "readOnly": null, "containerPath": "/journal", "sourceVolume": "journal_66b84f2254c64e2b7430388fe1201c2c" } ], "workingDirectory": null, "dockerSecurityOptions": null, "memory": 770, "memoryReservation": 512, "volumesFrom": [], "image": "cloudstax/firecamp-cassandra:3.11", "disableNetworking": null, "healthCheck": null, "essential": true, "links": null, "hostname": null, "extraHosts": null, "user": null, "readonlyRootFilesystem": null, "dockerLabels": null, "privileged": false, "name": "casdb-t1-container" } ], "placementConstraints": [], "memory": null, "taskRoleArn": null, "compatibilities": [ "EC2" ], "taskDefinitionArn": "arn:aws:ecs:us-east-1:709846101695:task-definition/casdb-t1:2", "family": "casdb-t1", "requiresAttributes": [ { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21" }, { "targetId": null, "targetType": null, "value": null, "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" } ], "requiresCompatibilities": [], "networkMode": "host", "cpu": null, "revision": 2, "status": "ACTIVE", "volumes": [ { "name": "66b84f2254c64e2b7430388fe1201c2c", "host": { "sourcePath": "66b84f2254c64e2b7430388fe1201c2c" } }, { "name": "journal_66b84f2254c64e2b7430388fe1201c2c", "host": { "sourcePath": "journal_66b84f2254c64e2b7430388fe1201c2c" } } ] }

JuniusLuo commented 6 years ago

Not sure how you create the service. Looks like the service does not exist in the system.

ddt7 commented 6 years ago

Hey i tries again with less memory and it worked. You run 2 more services manager and catalog which each take 128MB, thus 770 was probably on the edge, i tried 600 and it runs : )

JuniusLuo commented 6 years ago

Good to know that it worked :) Yes, we split the catalog service out. So there are 2 services each takes 128MB, other than 1 service with 256MB before.

ddt7 commented 6 years ago

Where can i find more documentation in order to add a service like RabbitMQ to firecamp, unless you have soon plan to do it? And where can i find more documentation on the cassandra/elasitic search/redis concerning how does scaling up or down with number of nodes works?

JuniusLuo commented 6 years ago

For adding a new service, you could refer to Cassandra service. You will need to add a few things:

  1. Generate the create service request and the service initialization request if the service requires the additional initialization after all replicas containers are running. Could refer to cascatalog.go The service detail configuration parameters will be stored in the service configuration file or the member configuration file. Refer to genServiceConfigs() and GenReplicaConfigs() functions in cascatalog.go

  2. Add the service dockerfile and entrypoint.sh. Could refer to https://github.com/cloudstax/firecamp/tree/master/catalog/cassandra/3.11/dockerfile. The Dockerfile could refer to the image in the docker hub, which usually mounts such as "VOLUME /var/lib/cassandra" in the official cassandra image. While, docker does not allow to overwrite the volume in the parent image. To make sure data is not written to the temporary volume, it is better to remove the default volume, explicitly specify the volume directory, and check whether the volume is mounted at entrypoint.sh.

  3. Add the service creation function to the catalog service. Could refer to the catalog cassandra functions. The function simply checks the requests, creates the service in the FireCamp management service, and runs the initialization task if necessary.

JuniusLuo commented 6 years ago

For scaling up the service, could refer to scaling up Cassandra. Currently you need to use nodetool to check the scaling is completely done. Scaling down is currently not supported. Scaling down requires Cassandra to recover the down replica from other nodes, which could be a heavy operation if the system has lots of data. The scaling for ElasticSearch and Redis are not supported yet.

jazzl0ver commented 4 years ago

For future reference: I was able to fix "create ...: VolumeDriver.Create: Create, GetServiceAttr error DB RecordNotFound req {... map[]}" error by deleting task definition for that service and re-create the service from scratch.