big-data-lab-umbc / Reproducible_and_portable_app_in_cloud

A toolkit to deploy, execute, analyze, and reproduce big data analytics automatically in the cloud.
6 stars 6 forks source link

Cannot see the results on instance while creation of new app. #13

Open C4rohan opened 1 year ago

C4rohan commented 1 year ago

I am trying to create a new aws serverless on the app Here is the aws serverless app link I uploaded to git. --> Link Here is the docker file for the app --> Link Here is the config files in examples --> Link

I am successful in running the the app. There are no errors while resource creation and the output is similar to the author's output.

image

The problem is I am unaware of ./AwsServerlessTemplate/NewAppTemplate/lambda and its working. I somehow managed to figure out the settings by looking into other applications.

Steps taken to execute the application

Output Observations and questions-->

C4rohan commented 1 year ago

The docker and the commands are tested on a testing instance and running correctly.. The current problem I believe is the same the commands and bootstrap I load doesnt execute on the app created instance. Main reason I think is because of aws set up of resources linked with deploy_config.json .

starlyxxx commented 1 year ago

Hi @C4rohan, thanks for your effort on this repository. Based on your screenshot, I found the CloudWatch Log on link. The log shows that the serverless pipeline cannot find the masterInstanceId.

In deploy_config.json, you have changed the parameter "DLSecurityGroup" to "sg-0f81c232cd437d43f". However, in app.py, I hardcoded the "DLSecurityGroup" to "distributed_dl_starly". I believe that editing the app.py file is a correct direction to solve all the issues.

Also, please try to use CloudWatch to debug the commands that run on the app created instance. It is helpful when I create every new aws serverless on the app.

C4rohan commented 1 year ago

Thank you for helping out for debugging .. I have changed the "DLSecurityGroup" to "sg-0f81c232cd437d43f" . Thank you again for letting me know about cloudwatch

jianwuwang commented 1 year ago

Thanks for working on it. I want to add some background. The current example only needs one GPU node. @C4rohan is trying to run Python code in serial with one GPU or in parallel with multiple GPUs on the same node.

C4rohan commented 1 year ago

@starlyxxx We have fallen into another error here is the link --> https://github.com/big-data-lab-umbc/Reproducible_and_portable_app_in_cloud/issues/14