Setup Cloud Run for per offset tile segmentation

rodrigoalmeida94 commented 2 years ago

[x] Add GCR
[x] Copy GCP object to local folder before building image (include model)
[x] Add docker image for offset tile
[x] Add cloud run deployment with config
[x] Add minimal tests

rbavery commented 2 years ago

We have a 3-band icevision model in pytorch pth format that you can work with to develop the serving for the tile segmentation: https://console.cloud.google.com/storage/browser/ceruleanml/experiments/cv2/05062022_ep10?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cerulean-338116&prefix=&forceOnObjectsSortingFiltering=false

This isn't a high performing model that can be used to test inference performance, but it can be used to test serving for 3 band images.

rbavery commented 2 years ago

Rodrigo will set up example cloud run func with this test model to test our assumptions on inference time.

rbavery commented 2 years ago

based on @rodrigoalmeida94 's finding on the cost of running the cloud run inference without classification, we can decide if it is worth building a classification model cloud run function. cc @jonaraphael

rodrigoalmeida94 commented 2 years ago

@rbavery I wanted to try the model above to check how I should format the inputs and when I tried to load it in my local machine I got the following error:

import torch
model = torch.load("/Users/rodrigoalmeida/cerulean-cloud/cerulean_cloud/cloud_run_offset_tiles/model/experiments_cv2_05062022_ep10_05062022_ep10.pth")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rodrigoalmeida/.virtualenvs/cerulean-cloud/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/Users/rodrigoalmeida/.virtualenvs/cerulean-cloud/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/Users/rodrigoalmeida/.virtualenvs/cerulean-cloud/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BackboneWithFPN' object has no attribute 'param_groups'

Should I be loading the model in some other way using icevision?

I'm guessing maybe we need to save the model like here so we don't have to pass the model architecture.

rodrigoalmeida94 commented 2 years ago

Because I was a blocked with the issue above, but wanted to show you inference running with Cloud Run I went ahead and created another cloud run function with what I included in this Gist https://gist.github.com/rodrigoalmeida94/3c2f5f96666bd23374e28f9cc31449cc

The model.pt file that is reference can be found in this file in gcs. I generated this "nonsense" model with:

from torchvision import models
import torch
model = models.resnet18(pretrained=True)
sm = torch.jit.script(model)
sm.save("resnet-18.pt")

This is the Cloud Run URL https://torch-inference-5qkjkyomta-ey.a.run.app , https://console.cloud.google.com/run/detail/europe-west3/torch-inference/metrics?project=cerulean-338116.

Warm up can take up to 4s, but once the function is warm we get inference response in 300-400ms.

rbavery commented 2 years ago

awesome. assigning to @lillythomas as well to pair up with you when you return. We'll be using the unet model to test for now instead of the icevision model.

rbavery commented 2 years ago

@rodrigoalmeida94 I've PRed code to save and load torch models here: https://github.com/SkyTruth/cerulean-ml/pull/85/files and an example model trained for 1 epoch is at the mounted ceruleanml bucket under /root/data/experiments/cv2/20_May_2022_19_29_39_fastai_unet/tracing_test_1batch_18_512_0.125.pt

The output of a model loaded with torch tracing are the logits with shape [1, 7, 512, 512] (1 batch, 7 classes, though we will later not include ambiguous).

These are softmaxed and then argmaxed to get an array representing the confidence scores and the maximally confident classes with shape [1, 512, 512]

I'll add docstrings to the above PR monday to wrap it up but feel free to use these funcs now @rodrigoalmeida94

rodrigoalmeida94 commented 2 years ago

@rbavery thanks a lot for this! I've tried running this locally in my machine, and when I use the load_tracing_model function I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rodrigoalmeida/cerulean-cloud/cerulean_cloud/cloud_run_offset_tiles/handler.py", line 28, in get_model
    return load_tracing_model("model/model.pt")
  File "/Users/rodrigoalmeida/cerulean-cloud/cerulean_cloud/cloud_run_offset_tiles/handler.py", line 22, in load_tracing_model
    tracing_model = torch.jit.load(savepath)
  File "/Users/rodrigoalmeida/.virtualenvs/cerulean-cloud/lib/python3.8/site-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

CPU: registered at aten/src/ATen/RegisterCPU.cpp:18433 [kernel]
Meta: registered at aten/src/ATen/RegisterMeta.cpp:12703 [kernel]
BackendSelect: registered at aten/src/ATen/RegisterBackendSelect.cpp:665 [kernel]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at ../aten/src/ATen/ConjugateFallback.cpp:22 [kernel]
Negative: fallthrough registered at ../aten/src/ATen/native/NegateFallback.cpp:22 [kernel]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:10483 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:11423 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:466 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Could this be some configuration error? Seems like it's expecting some CUDA backend. In the ideal world we would be able to load this model using an environment that only includes the torch package.

rbavery commented 2 years ago

I'll check on this I think there may be a setting needed when exporting to make it use the cuda device when exporting that I didn't use.

rbavery commented 2 years ago

This is actually because I saved the model with tracing with a gpu. I can save it for CPU. Tracing models are not architecture agnostic so I should include that info in the file name.

rbavery commented 2 years ago

@rodrigoalmeida94 Ok this model should work, I tested locally on my mac: /root/data/experiments/cv2/24_May_2022_01_49_56_fastai_unet/tracing_cpu_test_1batch_18_512_0.082.pt

>>> torch.jit.load("../tracing_cpu_test_1batch_18_512_0.082.pt")
RecursiveScriptModule(
  original_name=DynamicUnet
  (layers): RecursiveScriptModule(
    original_name=ModuleList
    (0): RecursiveScriptModule(
      original_name=Sequential
      (0): RecursiveScriptModule(original_name=Conv2d)
      (1): RecursiveScriptModule(original_name=BatchNorm2d)
      ........

rbavery commented 2 years ago

something to note, if the instance has a gpu and the inference is being run on a pytorch dataloader, then the dataloader will need to be moved to the cpu like so

import torch
experiment_dir = '/root/data/experiments/cv2/24_May_2022_01_49_56_fastai_unet/'
savename = "tracing_cpu_test_1batch_18_512_0.082.pt"
tracing_model = load_tracing_model(os.path.join(experiment_dir, savename))
out_batch_logits = test_tracing_model_one_batch(dls.to('cpu'), tracing_model)

rodrigoalmeida94 commented 2 years ago

I could load the model @rbavery ! Thanks so much 👍

rodrigoalmeida94 commented 2 years ago

@rbavery what is this model expecting as input? I passed a tensor with shape [1,1,512,512] and got RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead. I suppose it wants a 3 band image, but is this the composite of VV and aux datasets or just the VV band represented as RGB? More relevant for performance, I'll keep on developing with a dummy array.

rbavery commented 2 years ago

@rodrigoalmeida94 It's expecting the 3 band input, where auxillary datasets are separate channels.

rodrigoalmeida94 commented 2 years ago

This notebook demonstrate the working version of the cloud run function for inference with the pytorch model Ryan provided me. https://github.com/SkyTruth/cerulean-cloud/blob/cloud-run-inference/notebooks/test_cloud_run_offset_tile.ipynb

A couple steps in here:

We read in a tile (3 band image, since that's what the model currently supports) and encode it to a base64 hash so we can send it over the network.
We place a post request with the image.
We receive the inference class and confidence as an encoded base64 image. This is taking at the moment ˜6s to infer a single tile. We had estimated 5s per tile in our scenarios so I'll also look into additional optimizations we can do.

The pulumi deployment code is also in this branch, but I'm facing a pesky race condition issue while building docker images - I've reported this https://github.com/pulumi/pulumi-docker/issues/245 and waiting for feedback (this deployment I did manually).

SkyTruth / cerulean-cloud

Setup Cloud Run for per offset tile segmentation #21