Open pkrithivasan opened 3 months ago
Hi, hard to say, the error returned non-zero exit status 1
usually appears when there's not enough memory allocated. Can you please check that? Also, you may want to change recount: false
, otherwise your samples get recounted every time you execute the aberrantSplicing rule. Finally, add the --rerun-triggers mtime
parameter every time you run the aberrantSplicing module
Hi, thanks for the prompt response. I believe I am providing sufficient memory for the run - I'm running DROP on a machine with 256GB memory and 30cores. I re-ran the same set of 10 samples with recount: false
and --re-run-triggers mtime
, and still get the same error.
I manually ran the 03_filter_expression_FraseR.R script and ran into the same error. This specific line throws the error:
devNull <- saveFraserDataSet(fds,dir = workingDir)
Hi, can you check what the output of dir.exists(workingDir)
is for your case? It seems there is an issue in recognizing that the directory already exists for some reason.
Also, what version of FRASER are you using?
Hi,
This is the FRASER version: FRASER_1.99.4
and the directory does exist:
dir.exists("Output/processed_data/aberrant_splicing/datasets/") [1] TRUE
Also, I should add that when I run the same module on a single sample with external counts from 10 other samples, this AberrantSplicing_pipeline_Counting_03_filter_expression_FraseR_R script runs without error.
I believe I am running into the same error (DROP version 1.3.3). I have managed to run through the splicing module previously when using external counts. In this run I only have a set of ~100 non-external samples. The AE module went through without external counts.
Looking in htop
while running, it looks like there is plenty of RAM available so I don't think this is a memory issue.
A bit up in the R script I see that it is symlinking if no external counts are present. Can it be related to this? Seems saveFraserDataSet
in the FRASER package somehow collides with this.
# Add external data if provided by dataset
if(length(exCountIDs) > 0){
...
} else {
message("symLink fraser dir")
file.symlink(paste0(workingDir, "savedObjects/","raw-local-", dataset),
paste0(workingDir, "savedObjects/","raw-", dataset))
fds@colData$isExternal <- as.factor(FALSE)
workingDir(fds) <- workingDir
name(fds) <- paste0("raw-", dataset)
}
The singularity container I am using to run DROP uses FRASER_1.99.3
.
Here is my stack trace:
Fri Aug 16 12:18:54 2024: Filtering out introns with low read support ...
Fri Aug 16 12:20:38 2024: Filtering out non-variable introns ...
Fri Aug 16 12:21:10 2024: Filtering done!
Error in checkForAndCreateDir(fds, outDir) :
Can not create workding directory: output/processed_data/aberrant_splicing/datasets//savedObjects/raw-fraser
Calls: saveFraserDataSet -> checkForAndCreateDir
In addition: Warning message:
In dir.create(dir, recursive = TRUE) :
'output/processed_data/aberrant_splicing/datasets//savedObjects/raw-fraser' already exists
Execution halted
[Fri Aug 16 12:21:11 2024]
Error in rule AberrantSplicing_pipeline_Counting_03_filter_expression_FraseR_R:
jobid: 3
input: output/processed_data/aberrant_splicing/datasets/savedObjects/raw-local-fraser/jaccard.h5, Scripts/AberrantSplicing/pipeline/Counting/03_filter_expression_FraseR.R
output: output/processed_data/aberrant_splicing/datasets/savedObjects/fraser/fds-object.RDS, output/processed_data/aberrant_splicing/datasets/savedObjects/fraser/filter_FRASER2.done
log: <work>/.drop/tmp/AS/fraser/03_filter.Rds (check log file(s) for error details)
RuleException:
CalledProcessError in file /tmp/tmpuq_thm5b, line 89:
Command 'set -euo pipefail; Rscript --vanilla <work>/.snakemake/scripts/tmporwzmnov.03_filter_expression_FraseR.R' returned non-zero exit status 1.
File "/tmp/tmpuq_thm5b", line 89, in __rule_AberrantSplicing_pipeline_Counting_03_filter_expression_FraseR_R
File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-08-16T121828.769496.snakemake.log
Any thoughts on this one? This is the one step blocking us at the moment from evaluating the DROP output in our data.
Hi @Jakob37,
I cannot reproduce this issue and haven't seen it before. My guess is that if you remove the directory (output/processed_data/aberrant_splicing/datasets/savedObjects/raw-fraser) and re-run the pipeline, it will run successfully.
Hi @Jakob37,
I cannot reproduce this issue and haven't seen it before. My guess is that if you remove the directory (output/processed_data/aberrant_splicing/datasets/savedObjects/raw-fraser) and re-run the pipeline, it will run successfully.
Hello @AtaJadidAhari and thank you for taking the time to look into this.
I digged into this a bit more. I could indeed reproduce the error, and also tried removing the soft-link as you mentioned, which did not solve it.
The issue seems to be that the soft link to the raw-local-fraser
folder is soft-linked using the wrong working directory. This yields an invalid link as such:
<workdir>/output/processed_data/aberrant_splicing/datasets/savedObjects $ ls -l
lrwxrwxrwx 1 jakob cmd-bnf 78 Aug 22 20:31 raw-fraser -> output/processed_data/aberrant_splicing/datasets/savedObjects/raw-local-fraser
drwxr-xr-x 3 jakob cmd-bnf 19 Aug 22 20:31 raw-local-fraser
This causes the crash as such:
saveFraserDataset
in saveHDF5Objects.R
is called, which in turn calls checkForAndCreateDir
in helper-functions.R
dir.exists(dir)
will yield FALSE
on the broken symlink, but will fail to create the dir with a warning.checkForAndCreateDir <- function(object, dir){
verbose <- 0
if(is(object, "FraserDataSet")){
verbose <- verbose(object)
if(missing(dir)){
dir <- workingDir(object)
}
}
if(!dir.exists(dir)){
if(verbose > 1){
message(date(), ": The given working directory '",
dir, "' does not exists. We will create it.")
}
dir.create(dir, recursive=TRUE)
}
if(!dir.exists(dir)){
stop("Can not create workding directory: ", dir)
}
return(TRUE)
}
I tried hard-coding in the absolute path for the work dir into the DROP-script, and then the step successfully went through.
Unsure why it sometimes works then, as it works for you. Might be a version issue? (Edit: I see now that the author of the issue has the same issue with the latest version). Or something with running it through SLURM. I am using a container running DROP 1.3.3 (running this through the nextflow pipeline Tomte).
OK, after some more digging I think I have figured out the issue.
It seems the current code will only work if provided an absolute path. If given a relative path, it will crash here.
The working dir is calculated as such:
#' - workingDir: '`sm cfg.getProcessedDataDir() + "/aberrant_splicing/datasets/"`'
The cfg.getProcessedDataDir()
value is assign as such:
self.processedDataDir = self.root / "processed_data"
I suspect self.root
comes from the config.yaml
file.
At the moment (as generated by the Nextflow pipeline I am running), it is provided as such, i.e. with a relative path:
root: output
I think we could fix this in the Tomte pipeline, such that this isn't an issue.
It would be helpful though with an early error for relative paths if not allowed.
Thanks @Jakob37 for digging out! We'll now see on our side how to prevent this.
Hi,
I'm running DROP v1.4.0 on a set of 10 samples, with only the AberrantSplicing and AberrantExpression modules. The AberrantExpression module completes successfully, but I get the following error with the AberrantSplicing module:
This is what I have in my config file:
And this is the snakemake command:
snakemake --cores 3 -k
Please let me know if there's something that needs to be updated in the config file. Thanks!