aws / amazon-genomics-cli

https://aws.github.io/amazon-genomics-cli/
Apache License 2.0
147 stars 83 forks source link

nextflow wes adapter does not return more than 100 tasks #357

Closed wleepang closed 2 years ago

wleepang commented 2 years ago

Describe the Bug

The WES adapter and endpoint for contexts that use nextflow as an engine only returns details for up to 100 tasks in the WES GetRunLog response. Subsequently, agc logs workflow output only shows up to 100 tasks.

Steps to Reproduce

I created a workflow that generates 1000 tasks in both wdl and nextflow.

wdl workflow

version 1.0

workflow TestSimple1000 {
    Array[Int] numbers = range(1000)
    scatter ( number in numbers ) {
        call PrintNumber {
            input:
                number = number
        }
    }
}

task PrintNumber {
    input {
        Int number
    }

    command {
        echo ~{number}
    }

    runtime {
        docker: "public.ecr.aws/lts/ubuntu:latest"
        memory: "512 MB"
    }
}

nextflow workflow

nextflow.enable.dsl=2

process PrintNumber {
    container "public.ecr.aws/lts/ubuntu:latest"
    memory "512 MB"
    input:
        val number

    script:
        println "${number}"
        """
        echo "${number}"
        """
}

workflow {
    range = 0..999
    numbers = Channel.fromList(range.toList())
    PrintNumber(numbers)
}

I created an agc project for these workflows with the following file structure:

The project configuration is:

---
name: Test1000
schemaVersion: 1
workflows:
  test-simple-1000-wdl:
    type:
      language: wdl
      version: 1.0
    sourceURL: ./wdl/test-simple-1000
  test-simple-1000-nextflow:
    type:
      language: nextflow
      version: dsl2
    sourceURL: ./nextflow/test-simple-1000
contexts:
  onDemandCtxWdl:
    engines:
      - type: wdl
        engine: cromwell
  onDemandCtxNextflow:
    engines:
      - type: nextflow
        engine: nextflow
...

Deploy contexts:

agc context deploy --all

Running both workflows:

agc workflow run -c onDemandCtxWdl test-simple-1000-wdl
agc workflow run -c onDemandCtxNextflow test-simple-1000-nextflow

Retrieving the logs from the workflows after they complete and counting how many lines are returned in the output:

Accounting for the 5 lines that are part of the header in the agc logs workflow response, the WDL workflow has all 1000 tasks reported

agc logs workflow test-simple-1000-wdl | wc -l
1005

the nextflow workflow only has 100 tasks reported

agc logs workflow test-simple-1000-nextflow | wc -l
105

Using awscurl to retrieve the WES GetRunLog response directly for the onDemandCtxWdl context:

run_id=<test-simple-1000-wdl run-id>
wes="$(agc context describe onDemandCtxWdl | grep WESENDPOINT | cut -f 2)ga4gh/wes/v1"
runlog=$(awscurl "${wes}/runs/${run_id}")
echo "$runlog" | jq '.task_logs | length'

the above returns

1000

Using awscurl to retrieve the WES GetRunLog response directly for the onDemandCtxNextflow context:

run_id=<test-simple-1000-nextflow run-id>
wes="$(agc context describe onDemandCtxNextflow | grep WESENDPOINT | cut -f 2)ga4gh/wes/v1"
runlog=$(awscurl "${wes}/runs/${run_id}")
echo "$runlog" | jq '.task_logs | length'

the above returns

100

Relevant Logs

Expected Behavior

Actual Behavior

Screenshots

Additional Context

Operating System: Linux AGC Version: 1.2.0 Was AGC setup with a custom bucket: No Was AGC setup with a custom VPC: No

wleepang commented 2 years ago

After a bit of testing, I believe the offending LOC is here: https://github.com/aws/amazon-genomics-cli/blob/a51f57ff1f825aaf06fe7c74409e91ff74dd9061/packages/wes_adapter/amazon_genomics/wes/adapters/NextflowWESAdapter.py#L139

Changing the value to the hard limit value of 10000 would be an easy short term fix, but there may be a workflow that comes along that has more than 10,000 tasks that will need a better solution.

wleepang commented 2 years ago

Also worth pointing out that the log entries queried are generated after the workflow is complete. The Nextflow head job container dumps the contents of the .nextflow.log file as a cleanup step. Therefore, task activity is not available while a workflow is still running, and potentially lost if the Nextflow head job fails and does not perform cleanup.

wleepang commented 2 years ago

Attempting to retrieve the logs for a Nextflow workflow while it is still running produces an error:

$ agc logs workflow test-simple-1000 -r 1fe54a99-5431-493c-b7f6-f0af6cffd8fa
2022-03-17T23:43:00Z 𝒊  Showing the logs for 'test-simple-1000'
2022-03-17T23:43:00Z ✘   error="invalid character 'e' looking for beginning of value"
Error: an error occurred invoking 'logs workflow'
with variables: {logsSharedVars:{tail:false contextName: startString: endString: lookBack: filter:} workflowName:test-simple-1000 runId:1fe54a99-5431-493c-b7f6-f0af6cffd8fa taskId: allTasks:false failedTasks:false}
caused by: invalid character 'e' looking for beginning of value