Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.21k stars 3.37k forks source link

LightningWork does not move to the GPU #14777

Closed yuvals1 closed 9 months ago

yuvals1 commented 2 years ago

First check

Bug description

I am trying to run an app with LightningWork of type ServeGradio which should run on my local GPU. I am passing the LightningWork L.CloudCompute("gpu") (also tried "cuda") but It does not seem to move my model to the the GPU. When I am trying to move my model explicitly to the GPU I am getting the following error: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method. when I am not trying to move my model to the GPU the process runs.

How to reproduce the bug

class VideoServeGradio(ServeGradio):

    inputs = gr.Video()
    outputs = "playable_video"

    def __init__(self, cloud_compute, *args, **kwargs):
        super().__init__(*args, cloud_compute=cloud_compute, **kwargs)
        print("cuda", torch.cuda.is_available()) # this prints True

    def run(self):
        super().run()

    def predict(self, video):
        self.model(video)
        inferred_video_path = "./artifacts/out_vids/nonamevid.mp4" #this is the local path where the video is saved
        return inferred_video_path

    def build_model(self):
        print("cuda:", torch.cuda.is_available()) # this prints True as well
        pipe = MyPipeline(face_geometry_path=None)
        pipe.to("cuda") # this results in an error
        return pipe

class Flow(L.LightningFlow):
    def __init__(self):
        super().__init__()
        print("cuda:::::", torch.cuda.is_available())
        self.serve_work = VideoServeGradio(cloud_compute=L.CloudCompute("gpu"))

    def run(self):
        self.serve_work.run()

    def configure_layout(self):
        tab_2 = {"name": "Interactive demo", "content": self.serve_work}
        return [tab_2]

app = L.LightningApp(Flow(), debug=True)

Error messages and logs


# Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Important info


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

tchaton commented 2 years ago

Hey @yuvals1.

Thanks for trying Lightning App.

Here are some explications.

Could you try this:

class VideoServeGradio(ServeGradio):

    inputs = gr.Video()
    outputs = "playable_video"

    def run(self):
        super().run()

    def predict(self, video):
        # Move the model to cuda in the predict method.
        model = self.model.cuda()
        video = video.cuda()
        output = model(video)
        return output.cpu().item()

    def build_model(self):
        return MyPipeline(face_geometry_path=None)

class Flow(L.LightningFlow):
    def __init__(self):
        super().__init__()
        self.serve_work = VideoServeGradio()

    def run(self):
        self.serve_work.run()

    def configure_layout(self):
        tab_2 = {"name": "Interactive demo", "content": self.serve_work}
        return [tab_2]

app = L.LightningApp(Flow(), debug=True)
yuvals1 commented 2 years ago

Hey @tchaton, Thanks for the response. So I tried your suggestion and unfortunately now the process can't find Cuda for some reason: CUDA driver initialization failed, you might not have a CUDA gpu. Any ideas why?

tchaton commented 2 years ago

Hey @yuvals1. Some progress, different errors.

Mind trying this?

    def predict(self, video):
        # Move the model to cuda in the predict method.
        torch.set_device(torch.device('cuda:0'))
        model = self.model.cuda()
        video = video.cuda()
        output = model(video)
        return output.cpu().item()

cc @awaelchli

awaelchli commented 2 years ago

Hi @yuvals1 Is PyTorch working fine on that system otherwise? Please check that this works:

python -c "import torch; torch.rand(2).to('cuda:0')" 

Because the error

CUDA driver initialization failed, you might not have a CUDA gpu.

would suggest that your system/display driver is perhaps outdated?

As @tchaton said, the error

Cannot re-initialize CUDA in forked subprocess.

Is from torch and it seems that gradio.Interface().launch() that we use under the hood uses the forking method to create a subprocess. This is a limitation with torch, and thus all cuda operations should be performed inside that predict function. Hmm, I'm not sure what we could do here.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!