fw262 / TAR-scRNA-seq

scRNA-seq analysis beyond gene annotations using transcriptionally active regions (TARs) generated from sequence alignment data
GNU General Public License v3.0
9 stars 7 forks source link

temp2.txt no such file or directory #4

Closed dkeitley closed 3 years ago

dkeitley commented 3 years ago

Hi Michael,

Sorry to open another issue.

I just tried running the pipeline on my own samples and am getting an error relating to the temp2.txt file that's produced in the getCellsList rule.

[Sat Dec  5 18:39:43 2020]                                                                                                                                      rule getCellsList:                                                                                                                                                  input: results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_gene_dge.summary.txt                                                                                         output: results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_cellList.txt
    jobid: 1739
    wildcards: path=results_out/SIGAA9_S22_L002, sample=SIGAA9_S22_L002

cut: temp2.txt: No such file or directory
[Sat Dec  5 18:39:43 2020]
Finished job 2375.
506 of 2514 steps (20%) done
[Sat Dec  5 18:39:43 2020]
Error in rule getCellsList:
    jobid: 1739
    output: results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_cellList.txt
    shell:

                sed '/^#/ d' < results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_gene_dge.summary.txt > temp.txt
                tail -n +3 temp.txt > temp2.txt
                cut -f1 temp2.txt > results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_cellList.txt
                rm temp.txt
                rm temp2.txt

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job getCellsList since they might be corrupted:
results_out/SIGAA9_S22_L002/SIGAA9_S22_L002_cellList.txt
INFO    2020-12-05 18:39:45     SamToFastq      Processed     3,000,000 records.  Elapsed time: 00:00:23s.  Time for last 1,000,000:    6s.  Last read position: */*

I tried leaving out this sample and running on the rest but I'm still getting the same error. It works fine running on a single sample, so I'm not sure if it's an issue with certain fastq files or if it's due to something else...?

Any suggestions would be much appreciated.

Many thanks,

Dan

fw262 commented 3 years ago

Hi Dan,

I appreciate your feedback and thank you for pointing out this bug.

The issue you are facing is likely due to the removal of temporary files (temp.txt and temp2.txt) from parallel sample processes. I have updated the rule "getCellsList" in Snakefile to create temporary files for each sample in your data instead of creating temp files that may get removed by parallel processes.

I hope that helps resolve your issue!

Best, Michael

dkeitley commented 3 years ago

Thanks Michael! I changed the pushed fix slightly (see below) in my local version just so that it's renamed more cleanly. I think {input}.temp.txt was coming out as "{path}/{sample}_gene_dge.summary.txt.temp.txt".

rule getCellsList:
        input:  '{path}/{sample}_gene_dge.summary.txt'
        output: '{path}/{sample}_cellList.txt'
        params:
                sample='{sample}'
        shell:
                """
                sed '/^#/ d' < {input} > {params.sample}_temp.txt
                tail -n +3 {params.sample}_temp.txt > {params.sample}_temp2.txt
                cut -f1 {params.sample}_temp2.txt > {output}
                rm {params.sample}_temp.txt
                rm {params.sample}_temp2.txt
                """

Thanks again,

Dan