big-data-lab-umbc / Reproducible_and_portable_app_in_cloud

A toolkit to deploy, execute, analyze, and reproduce big data analytics automatically in the cloud.
6 stars 6 forks source link

Stuck at Loading function in cloudwatch #15

Open C4rohan opened 1 year ago

C4rohan commented 1 year ago

Aws app template --> GPU-CloudPhasePredictionDAMA-WL

Docker --> GPU-CloudPhasePredictionDAMA-WL

App Settings --> GPU-CloudPhasePredictionDAMA-WL

Here is the Link to last running Cloudwatch --> Link

We see that this time the docker pulls successfully and moves on with next commands, but the code gets stuck at loading function . Can you please help me understanding the cloudwatch execution if is it successful or buggy ?.. And also if there is any way to see live commands running on the instance while the the code is executing ? Or maybe I just might have to find some way to throw the output on the screen via adding few lines in /lamda/app.py ?

Here are some screenshots of the cloud watch -->

image image

Here is the screenshot of output on local terminal -->

image
starlyxxx commented 1 year ago

I think your app got stuck with the command "wget -P /home/ubuntu/ https://ai-4-atmosphere-remote-sensing.s3.amazonaws.com/cloud-phase-prediction-main.zip && unzip /home/ubuntu/cloud-phase-prediction-main.zip -d /home/ubuntu/ && mkdir -p /home/ubuntu/data/output_data". In your S3 bucket, the permission of file "cloud-phase-prediction-main.zip" should be set in public.

I suggest you can logging into the created instance and debugging each command (like the command above) before you add it to app.py. Cloudwatch just helps the user to print out every command result for serverless function. You can of course add a few more lines in /lamda/app.py to print out more information about your code.

C4rohan commented 1 year ago

Thanks for your quick response .. The bucket was already public. I retyped the command Sample.Event.json and it worked .. Although now, I do not have the same error.. Cloud Watch Link --> Link

Here is a screenshot of cloudwatch output -->

image image image

Does the above output mean that i should check the working of docker run command now ? Really thank you so much again..

C4rohan commented 1 year ago

Thank you again @starlyxxx I think i had selected wrong aws image , i have also added this issue which was gpu related here --> Link . and also there should be an updated nvidia docker command . something like this -->docker run --runtime=nvidia -v ....

But the issue still persists...(Ssm run command: echo {\"Configurations\": {\"awsRegion\": \"us-west-2\", \"ec2KeyPath\": \"/Users/rohansalvi/Documents/Reproducible_and_portable_app_in_cloud/ConfigTemplate/Rohan_west.pem\", \"bill\": {\"instancetype\": \"p3.8xlarge\", \"EC2_price\": 24.48, \"EBS_price\": 0.01, \"data_size\": 0.34}, \"instance_num\": 1, \"gpu_num\": 1, \"source_data\": {\"bucketname\": \"cloud-phase-prediction-data\", \"prefix\": \"cloud-phase-prediction-main.zip\", \"filename\": \"cloud-phase-prediction-main.zip\", \"version\": \"null\"}, \"output_result\": {\"bucketname\": \"portableapp-west\", \"prefix\": \"84b0f1bd-bdcd-4177-a566-1bb8502c08a5\", \"filename\": \"result.txt\"}, \"output_event\": {\"bucketname\": \"portableapp-west\", \"prefix\": \"84b0f1bd-bdcd-4177-a566-1bb8502c08a5\", \"filename\": \"event.json\"}, \"terminate\": \"aws cloudformation delete-stack --stack-name rpacautoanalytics\", \"docker_image\": \"rohansalvi98/cloudpredictionphasegpu:latest\", \"command_line\": \"docker run --runtime=nvidia -v /home/ubuntu/cloud-phase-prediction-main:/root/CloudPhasePrediction -v /home/ubuntu/output_data:/root/output_data rohansalvi98/cloudpredictionphasegpu:latest sh -c \'conda activate cloud-phase-prediction-env && cd CloudPhasePrediction && python train.py --training_data_path=./example/training_data/ --model_saving_path=/root/output_data'\"}, \"Commands\": {\"bash\": \"wget -P /home/ubuntu/ https://ai-4-atmosphere-remote-sensing.s3.amazonaws.com/cloud-phase-prediction-main.zip && unzip /home/ubuntu/cloud-phase-prediction-main.zip -d /home/ubuntu/ && mkdir -p /home/ubuntu/data/output_data\"}} | tee -a /home/ubuntu/event.json)

C4rohan commented 1 year ago

@starlyxxx Hey I am not able to figure out the error mentioned above.. What can be the possible error ?

starlyxxx commented 1 year ago

I believe that the command should be nvidia-docker run ... in your app. Like the command shown here.

For a better understanding of how to run a GPU app step by step, I recommend reading single GPU example and multi-GPU example.

C4rohan commented 1 year ago

@starlyxxx There was a certain error with docker execution using that command and solutions on websites asked me use this alternative command, that's the reason I switched to docker run --runtime=nvidia .. I manually tried to run the command on the instance and it runs ..

here is the updated one --> docker run --runtime=nvidia -v /home/ubuntu/cloud-phase-prediction-main:/root/CloudPhasePrediction -v /home/ubuntu/output_data:/root/output_data rohansalvi98/cloudpredictionphasegpu:latest sh -c cd CloudPhasePrediction && python3 train.py --training_data_path='./example/training_data/' --model_saving_path='/root/output_data'

starlyxxx commented 1 year ago

Try to replace all apostrophe with back apostrophe.