ENCODE-DCC / hic-pipeline

HiC uniform processing pipeline
MIT License
56 stars 24 forks source link

test hic fail on aws #152

Open jarekgeneg opened 2 years ago

jarekgeneg commented 2 years ago

When we perform test hic on aws it's fail with file can't be read from s3

OS/Platform

Caper configuration file backend=aws no-server-heartbeat=True max-concurrent-workflows=300 max-concurrent-tasks=1000 local-out-dir=/opt/caper/local_out_dir local-loc-dir=/opt/caper/local_loc_dir aws-batch-arn=arn:aws:batch:eu-west-2:my_id:job-queue/caper-queue aws-region=eu-west-2 aws-out-dir=s3://caper-hic aws-loc-dir=s3://caper-hic/.caper_tmp cromwell=https://storage.googleapis.com/caper-data/cromwell/cromwell-65-d16af26-SNAP.jar db=postgresql postgresql-db-ip=xxxx postgresql-db-port=5432 postgresql-db-user=xxxxx postgresql-db-password=xxxxx postgresql-db-name=xxxxxx

Input JSON file standard test file: tests/functional/json/test_hic.json

Error log Started troubleshooting workflow: id=756ed754-2f3d-46e4-8323-194f21bf11fb, status=Failed Found failures JSON object. [ { "causedBy": [ { "causedBy": [ { "causedBy": [ { "message": "s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt", "causedBy": [] } ], "message": "Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt" } ], "message": "[Attempted 1 time(s)] - IOException: Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-rc.txt" }, { "causedBy": [ { "causedBy": [ { "message": "s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt", "causedBy": [] } ], "message": "Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt" } ], "message": "[Attempted 1 time(s)] - IOException: Could not read from s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-rc.txt" } ], "message": "Workflow failed" } ]

==== NAME=hic.normalize_assembly_name, STATUS=Failed, PARENT= SHARD_IDX=-1, RC=None, JOB_ID=74e5e04d-1af7-4a18-97e7-555fc23f58bd START=2022-03-15T10:05:48.022Z, END=2022-03-15T10:09:36.113Z STDOUT=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-stdout.log STDERR=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-normalize_assembly_name/normalize_assembly_name-stderr.log

==== NAME=hic.get_ligation_site_regex, STATUS=Failed, PARENT= SHARD_IDX=-1, RC=None, JOB_ID=32b91045-cf5f-4bca-8f07-4d266d13f97c START=2022-03-15T10:05:48.585Z, END=2022-03-15T10:09:26.285Z STDOUT=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-stdout.log STDERR=s3://caper-hic/hic/756ed754-2f3d-46e4-8323-194f21bf11fb/call-get_ligation_site_regex/get_ligation_site_regex-stderr.log

Machine was provisioned with caper aws create env script

paul-sud commented 2 years ago

Usually such an error indicates that the job didn't even start executing on Batch. Do you have the backend log available? Looking at your Caper config, you have the following value:

aws-batch-arn=arn:aws:batch:eu-west-2:my_id:job-queue/caper-queue

The my_id is a bit odd to me, I think usually you'd put your AWS account number there.

jarekgeneg commented 2 years ago

Caper config - yes, there is account name or queue number - i just hide it
AWS Batch start virtual machine and just fail with error

And aout backenf log ? where I can find it ?

jarekgeneg commented 2 years ago

Log from backed:

cromwell_encodedcc_hic-pipeline_1_11_23f2a62da3521400809715244c3eeb9fcfce048da/default/dce5fe3c972e4392a3777d8b608e9d7d:

2022-03-15T10:08:57.007Z /bin/bash: /var/scratch/fetch_and_run.sh: Is a directory @ingestionTime 1647338937082 @log 076634410064:/aws/batch/job @logStream cromwell_encodedcc_hic-pipeline_1_11_23f2a62da3521400809715244c3eeb9fcfce048da/default/dce5fe3c972e4392a3777d8b608e9d7d @message /bin/bash: /var/scratch/fetch_and_run.sh: Is a directory @timestamp 1647338937007

paul-sud commented 2 years ago

The backend log should be a file named something like get_ligation_site_regex.log. If you look at the workflow metadata JSON it will point you to the right file on S3, in the JSON it should be under the key calls > hic.get_ligation_site_regex > backendLogs > log. This file will contain more details about anything that went wrong on the provisioned machine before the task started executing, for instance any issue with localizing input data.

jarekgeneg commented 2 years ago

Question:

in metadata.json :

{ "causedBy": [{ "causedBy": [{ "message": "s3://s3.amazonaws.com/caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt", "causedBy": [] }], "message": "Could not read from s3://caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt: s3://s3.amazonaws.com/caper-hic/hic/bd74874b-ebed-4b73-8058-f484ac8695bc/call-normalize_assembly_name/normalize_assembly_name-rc.txt" }

is uri s3://s3.amazonaws.com/caper-hic/.... is valid ? When we search AWS docs (https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html#accessing-a-bucket-using-S3-format) AWS don't recommend this type uri which is used in hic pipeline.

On AWS we give full access to S3 for AWS BATCH and AWS EC2; while pipeline starts, directories and files are created on s3 volume

Anyway, I can't find any backed log -> it's maybe script which is run to create this files can't move files to S3 ? the same script creates stdout and stderr for process (i think ), and also I can't find this files on S3 volume

leepc12 commented 2 years ago

I think something went wrong with Caper's AWS backend.

/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory

I will test it on AWS soon and let you know but until then please try on Google Cloud Platform. It's much more stable.

mziebagg commented 2 years ago

Dears Thanks a lot for your help We are Genegoggle startup and receive some free$ on AWS so we cant easy migrate to Google Cloud platform (costs)

@leepc12 In AWS we have prepared account/organization and configured vm for pipeline-hic . So if we can accelarate we can give you acess to this test environment

If you find somethings will be great Thanks in advance

BR Michał