bentoml / google-compute-engine-deploy

Apache License 2.0
5 stars 3 forks source link

Default disk size is too small for larger models (e.g. Transformers) #12

Open pjchungmd opened 1 year ago

pjchungmd commented 1 year ago

Issue

When deploying a pretrained model from Huggingface, in particular a model which is downloaded from Huggingface after the container has been setup, the default disk size of 10 Gi is too small and causes unexpected errors.

Possible Solution

In the terraform_default.tf file, we can change the boot_disk section from:

resource "google_compute_instance" "vm" {
  project                   = var.project_id
  name                      = "${var.deployment_name}-instance"
  machine_type              = var.machine_type
  zone                      = var.zone
  allow_stopping_for_update = true

  boot_disk {
    initialize_params {
      image = module.gce-container.source_image
    }
  }
  ...

to something like:

resource "google_compute_instance" "vm" {
  project                   = var.project_id
  name                      = "${var.deployment_name}-instance"
  machine_type              = var.machine_type
  zone                      = var.zone
  allow_stopping_for_update = true

  boot_disk {
    initialize_params {
      image = module.gce-container.source_image
      size = 20
    }
  }
  ...

However this may be a waste for most use cases. Another possibility is adding size to OPERATOR_SCHEMA in operator_config.py or just updating the README to instruct the user to set the size if they are encountering problems.

jjmachan commented 1 year ago

I guess the sweet spot is something like

  boot_disk {
    initialize_params {
      image = module.gce-container.source_image
      # default disk size (in GBs)
      size=10
    }
  }

And document this in the readme. This means users know exactly were to modify the main.tf directly in case they run into the same disk full issues. Something like https://github.com/bentoml/aws-ec2-deploy#troubleshooting could help too.

What do you think @pjchungmd ?