awslabs / service-workbench-on-aws

A platform that provides researchers with one-click access to collaborative workspace environments operating across teams, universities, and datasets while enabling university IT stakeholders to manage, monitor, and control spending, apply security best practices, and comply with corporate governance.
Apache License 2.0
178 stars 119 forks source link

Increase the number of supported studies per workspace #699

Open aGutierrezSacristan opened 3 years ago

aGutierrezSacristan commented 3 years ago

Describe the bug When creating a workspace in SWB with multiple studies associated, when having more than 10 studies, the workspace is not created, and stayed in permanent "pending" status, with not even the option to terminate it.

To Reproduce Steps to reproduce the behavior:

  1. Select 44 studies.
  2. Create a SageMaker Notebook-v3 Workspace Type Large-Sized Sagemaker Instance Type : ml.t2.2xlarge vCpus : 8 Memory : 32GiB Auto-stop idle-time : 60min

Expected behavior That the workspace is created and available and if there is a maximum number of studies, or some other related issues, to get some type of message and to have the chance to terminate it from SWB.

Screenshots Screen Shot 2021-09-08 at 2 11 23 PM

Versions (please complete the following information): Release Version installed: 3.3.1

Additional context If worked when selecting 10 studies but it is always in pending when selecting all the studies that we require (in this specific case around 40)

jol-ster commented 3 years ago

Thank you for reporting this. Can you please let me know which category of studies are involved? There are three types.. 1) MyStudies 2) Organizational Studies 3) Studies that are "Data Sources" from the Bring Your Own Bucket registration

If it is a mix of types, that would be helpful to understand as well.

aGutierrezSacristan commented 3 years ago

Thanks @jol-ster The study types are: Organization studies.

SanketD92 commented 3 years ago

Hello @aGutierrezSacristan, could you check a few things in your account for us that would help us in debugging further:

  1. BYOB (or Data Source) studies are also listed under the Organization studies category in SWB, but users cannot view their files from the SWB UI. The way SWB handles these studies is different than the rest of the Organization studies. Could you confirm the studies you selected did not include any BYOB studies?
  2. Could you check if there was any error on the CloudFormation stack corresponding to this workspace, usually starts with 'SC-' on the hosting account (ie. account linked to the SWB project)?
carvantes commented 3 years ago

Hi,

There are 2 issues:

1.- Mounting many studies causes errors. There isn't a specific max number of studies, but the study s3 locations are concatenated as a param and eventually breach the max CFN string param size of 4096

https://github.com/awslabs/service-workbench-on-aws/blob/f81a04f777601060c4f741781918374aef0dba25/addons/addon-environment-sc-api/packages/environment-sc-workflow-steps/lib/steps/launch-product/launch-product.js#L126

]' at 'provisioningParameters.8.member.value' failed to satisfy constraint: Member must have length less than or equal to 4096
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
code: 'ValidationException',
time: 2021-09-10T07:51:23.512Z,
requestId: '2d49f856-62c8-4ebc-bce1-31c97f64d6ae',
statusCode: 400,
retryable: false,
retryDelay: 848.8822964893578

2.- There is a bug on the error handling of the provisioning workflow We attempt to store the error msg in DDB, but the error msg is huge (since it contains the string longer than 4096) and the error msg has a max size of 2048. The workflow crashes and the workspace is left on the "pending" status forever. https://github.com/awslabs/service-workbench-on-aws/blob/b799b5bfb1305c07b04742725697c6b0e5f47bdc/addons/addon-base-raas/packages/base-raas-services/lib/schema/update-environment-sc.json#L27

2021-09-10T07:51:23.571Z b6acd231-99f0-4238-ab49-84114a5a20b4 ERROR
{ "solutionName": "swb", "envType": "e2eTest", "envName": "e2etest", "logLevel": "error", "boom": true, "code": "badRequest", "status": 400, "safe": true, "payload": { "validationErrors": [ { "keyword": "maxLength", "dataPath": ".error", "schemaPath": "#/properties/error/maxLength", "params": { "limit": 2048 }, "message": "should NOT be longer than 2048 characters" } ] }, "msg": "Input has validation errors", "stack": "Error: Input has validation errors\n at Boom.badRequest (/var/task/src/lambdas/workflow-loop-runner/webpack:/home/runner/work/service-workbench-on-aws/service-workbench-on-aws/addons/addon-base/packages/services-container/lib/boom.js:48:38)\n at JsonSchemaValidationService.ensureValid (/var/task" }

Fixing the error handling bug is higher priority. Then we can evaluate alternatives that allow mounting more studies.

carvantes commented 3 years ago

PR to fix the error handling bug: https://github.com/awslabs/service-workbench-on-aws/pull/705

avillach commented 3 years ago

Hi, looking at the s3 mount parameter we see that on average one study is 256 caracters. so we have on average max 16 studies and that's a too low number. what is the timeline to increase this? Thanks Paul

maghirardelli commented 2 years ago

Hi @avillach !

We are going to look into alternatives to allow for more than 16 studies on average. We don't know a timeline for this yet.

Thanks! Marianna