broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
972 stars 354 forks source link

slurm backend file name too long #4703

Open flying-polarbear opened 5 years ago

flying-polarbear commented 5 years ago

my code end ENV:

config file:

backend { providers { SLURM { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config {

    filesystems {
      local {
        localization: [
          "soft-link",
          "hard-link",
          "copy"
        ]
      }
    }

    runtime-attributes = """
    String time = "2-0"
    Int cpus = 2
    Int memory = 8000
    String queue = "compute"
    """

    submit = """
      sbatch -J ${job_name} -D ${cwd} -o ${out} -e ${err} -t ${time} -p ${queue} \
      ${"-c " + cpus} \
      --mem-per-cpu=${memory} \
      --wrap "/bin/bash ${script}"
      """

    job-id-regex = "Submitted batch job (\\d+).*"
    kill = "scancel ${job_id}"
    check-alive = "squeue -j ${job_id}"
  }
}

}

wdl file : task SamToFastqAndBwaMem { ...... ...... command <<< set -o pipefail set -e

# set the bash variable needed for the command-line
bash_ref_fasta=${ref_fasta}

            java -Dsamjdk.compression_level=${compression_level} ${java_opt} -jar ${gotc_path}picard.jar \
  SamToFastq \
                    INPUT=${input_bam} \
                    FASTQ=/dev/stdout \
                    INTERLEAVE=true \
                    NON_PF=true \
| \
            ${bwa_path}${bwa_commandline} /dev/stdin -  2> >(tee ${output_bam_basename}.bwa.stderr.log >&2) \
| \
            samtools view -1 - > ${output_bam_basename}.bam

runtime {

backend: "SLURM"

memory: mem_size

cpus: num_cpu

}

output { File output_bam = "${output_bam_basename}.bam" File bwa_stderr_log = "${output_bam_basename}.bwa.stderr.log" } }

all parameters goes ok, but below are some problems:

1: Caused by: common.exception.AggregatedMessageException: Error(s): : Could not localize -> /nfs/disk3/user/gaoyuhui/github/test/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test: doesn't exist Cannot localize directory with symbolic links /nfs/disk3/user/gaoyuhui/github/test/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test -> /nfs/disk3/user/gaoyuhui/github/test: Operation not permitted

2: ... ... amToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/8fc94dc1-722b-40d5-9840-9d6e4a66db21/call-SamToFastqAndBwaMem/inputs/-1845554049/test.tmp/cromwell-workflow-logs/workflow.8fc94dc1-722b-40d5-9840-9d6e4a66db21.log: File name too long at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:68) at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:64) at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:563) ... 32 common frames omitted

cromwell: v36.1 my working dir is: /nfs/disk3/user/gaoyuhui/github/test, has only 2.wdl and 2.json for test, but everytime this simple task is getting recursive and finally file name too long, and when change backend to local, it is the same. I found other topic and change to other dir to run this wdl task, got the same error. so, can someone check about this? How can I goes well. it is a bug or my mistake?? Yours, sincerely! Gao

matthdsm commented 5 years ago

I have the same issue with torque complaining the directory argument (-d) is too long (256 + characters), causing the jobs to crash.

M

kkshaxqd commented 4 years ago

你好,你解决这个问题了么?我也遇到同样的问题,怎么更改CWL默认目录的长度?

matthdsm commented 4 years ago

你好,你解决这个问题了么?我也遇到同样的问题,怎么更改CWL默认目录的长度?

to google translate that:

Hello, have you solved this problem? I also encountered the same problem, how to change the length of the CWL default directory