CSU-KangHu / HiTE

High-precision TE Annotator
GNU General Public License v3.0
60 stars 3 forks source link

Nextflow Pipeline: No such file or directory #27

Open ayoraind opened 4 days ago

ayoraind commented 4 days ago

As per issue #25, for some reason, the pipeline does not seem to work when integrated into the Nextflow pipeline, due to the absence of some python scripts. However, when I checked the directory containing these scripts (/home/gitpod/.nextflow/assets/CSU-KangHu/HiTE/module/), these files exist. Based on your suggestion on issue #25, I added execute permissions but still got the same error message. Kindly find more details below

 executor >  local (3)
  [8c/ed4312] EnvCheck (envcheck)                                  [100%] 4 of 4, cached: 4 ✔
  [9d/877755] LTR (GCF_000733995.1_ASM73399v1_genomic.fna)         [100%] 1 of 1, failed: 1 ✘
  [9c/3c34fb] OtherTE (GCF_000733995.1_ASM73399v1_genomic.fna)     [100%] 1 of 1, failed: 1 ✘
  [-        ] MergeLTROther                                        -
  [39/8c53bc] SplitGenome (GCF_000733995.1_ASM73399v1_genomic.fna) [100%] 1 of 1, failed: 1 ✘
  [-        ] coarseBoundary                                       -
  [-        ] TIR                                                  -
  [-        ] Helitron                                             -
  [-        ] Non_LTR                                              -
  [-        ] BuildLib                                             -
  [-        ] AnnotateGenome                                       -
  [-        ] Benchmarking                                         -
  ---Check if LTR_retriever is installed.

  ---Check if rmblastn is installed.

  ---Check if RepeatModeler is installed.

  ---Check if RepeatMasker is installed.

  Execution cancelled -- Finishing pending tasks before exit
  WARN: Undocumented setting `docker.userEmulation` is not supported any more - please remove it from your config
  ERROR ~ Error executing process > 'SplitGenome (GCF_000733995.1_ASM73399v1_genomic.fna)'

  Caused by:
    Process `SplitGenome (GCF_000733995.1_ASM73399v1_genomic.fna)` terminated with an error exit status (2)

  Command executed:

    python3 /home/gitpod/.nextflow/assets/CSU-KangHu/HiTE/module/split_genome_chunks.py      -g GCF_000733995.1_ASM73399v1_genomic.fna --chrom_seg_length 1000000 --chunk_size 400

    Command exit status:
    2

  Command output:
    (empty)

  Command error:
    python3: can't open file '/home/gitpod/.nextflow/assets/CSU-KangHu/HiTE/module/split_genome_chunks.py': [Errno 2] No such file or directory

  Work dir:
    /workspace/genomeqc/work/CSU-KangHu/HiTE_GCF_000733995/work/39/8c53bcefa5e99d065e1661e6805c1c

  Container:
    kanghu/hite:3.2.0

  Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

   -- Check '.nextflow.log' file for details

  . Expression: nextflow.ast.LangHelpers.compareEqual(process.waitFor(), 0) -- Check script './workflows/../modules/local/nextflow/run/main.nf' at line: 33

Source block:
  def cache_dir = Paths.get(workflow.workDir.resolve("${pipeline_name}_${genome.simpleName}").toUri())
  Files.createDirectories(cache_dir)
  def nxf_cmd = [
      'nextflow run',
          pipeline_name,
          nextflow_opts,
          params_file ? "-params-file $params_file" : '',
          additional_config ? "-c $additional_config" : '',
          genome ? "--genome $genome" : '',
          "--outdir $task.workDir/results",
  ]
  def builder = new ProcessBuilder(nxf_cmd.join(" ").tokenize(" "))
  builder.directory(cache_dir.toFile())
  process = builder.start()

  assert process.waitFor() == 0: process.text

Work dir:
  /workspace/genomeqc/work/31/f9e75a8d63d2d1eb02f3cba4fd7ccb

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Is this error still due to permission issues? Is it also possible that the scripts already have execute privileges to avoid weird errors?

CSU-KangHu commented 2 days ago

Hi @ayoraind,

This seems like a rather unusual issue. I attempted to replicate the problem you described, but couldn’t reproduce it.

I ran the following command in the directory /home/chenhs/.nextflow/assets/CSU-KangHu/HiTE:

nextflow run main.nf -profile docker --genome demo/genome.fa --outdir test_out

The program ran successfully without any issues. image

ayoraind commented 2 days ago

Hi @CSU-KangHu,

Thank you for your response. A well-respected nf-core maintainer looked through the code, and he asked if your executable scripts could be transferred into the bin/ folder so that the module/ folder would only contain actual nextflow modules. This would help to fix the portability issues I am facing.

CSU-KangHu commented 2 days ago

Hi @ayoraind,

From your description, it seems you suspect that the error is due to Nextflow being unable to locate files within the HiTE/module path, though it can access files in HiTE/bin. However, based on the error message, it appears that Nextflow cannot find the absolute path /home/gitpod/.nextflow/assets/CSU-KangHu/HiTE/module/split_genome_chunks.py, even though you've confirmed that this file does exist.

Since the Python scripts in the module folder are called from various other locations, I’d prefer not to change their paths unless absolutely necessary.

For your specific case, since you only need to modify the Nextflow portion of the code, perhaps moving the module folder to the bin directory and adjusting line 115 in main.nf (ch_module = "${projectDir}/module") might achieve your objective.

CSU-KangHu commented 2 days ago

Hi @ayoraind, I consulted with a colleague, and I may rewrite the Nextflow script, which could potentially solve your issue. Once I have it completed, I’ll let you know.

ayoraind commented 2 days ago

Thank you @CSU-KangHu, I look forward to the updated script.

CSU-KangHu commented 1 day ago

Hi @ayoraind,

I have made some modifications to the HiTE code by adding the module and tools directories to the Docker system environment variables, and I have also granted executable permissions to the scripts. Previously, you needed to run the script with python3 /home/gitpod/.nextflow/assets/CSU-KangHu/HiTE/module/split_genome_chunks.py, but now you can simply run split_genome_chunks.py directly, which should resolve the issue you're facing.

I have extracted the Nextflow-related code and created a new project called HiTE-Nextflow (https://github.com/CSU-KangHu/HiTE-Nextflow), which includes only the main.nf, config files, and demo data from the original HiTE code, making it more lightweight. Please follow the updated tutorial to download the latest HiTE-Nextflow code and Docker image. I believe this should address the issue you're encountering.

If you run into any further issues during usage, feel free to contact me.

Best regards.