Closed stolarczyk closed 3 years ago
Glad the tutorial is running well! I just glanced over the looper log file. It looks like bedstat
can't completely process those bed files because of an issue with GenomicDistributions
, more specifically the plotQThist
function.
Error in cut.default(dists, divisions, labels) : 'breaks' are not unique
Calls: doItAall ... grid.draw -> plotQTHist -> cutDists -> cut -> cut.default
In addition: Warning message:
Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
Execution halted ```
probably there's not a wide enough distribution so it's duplicating breaks. that's a bug in GD, should create an issue.
@joseverdezoto in the looper run
command you added -R
option.looper run
has no -R
option defined so it does nothing. Perhaps you wanted to pass the argument to the pipeline. The argument passing strategy has changed in looper v1.2.0.
I'll make the change in the code. Just wanted to point that out for future reference. See http://looper.databio.org/en/latest/parameterizing-pipelines/
I added that flag because I came across a warning that the pipeline wasn't properly shut down. That message suggested to run looper in -R mode. I'll keep that in mind.
do you still have that log somewhere?
I don't think I do. I removed the entire tutorial produced folder when I ran it again. I'll let you know if I come across that warning again.
I presume that the message you're referring to comes from pypiper:
it suggests to run the pipeline in recover mode, not looper. So using looper run --command-extra="-R"
is the way to go
I was able to successfully run the tutorial 🎉
FYI, I wanted to use the new PEP and pipeline interface formats, so I cloned the dev/cfg2 branches of our pipelines and
looper
. Additionally I used GenomicDistributions@dev to test the plots with recent updates.With this software configuration I ended up with 10 bedfiles in Elasticsearch, so 5 samples failed. However, I think only GenomicDistributions discrepancy is actually relevant here, since all the submission scripts were produced successfully.
Link to my
$BBTUTORIAL/outputs/bedstat_output/bedstat_pipeline_logs/looper_logs.txt
Here's some feedback:
[x] no need to
cd $HOME
at the beginning. I'd like to run this somewhere else[x] no need to unzip the open signal matrices,
data.table::fread
supports reading gzipped files[x] software names display style is still not consistent. Sometimes they are
preformatted
and sometimes not[x] also pay attention to software names capitalization, e.g. Elasticsearch instead of elasticsearch
[x] use
mkdir -p
to create nested directories instead of creating dir by dir[x] in "Run bedstat on the demo PEP" section: the largest chunk of text in the entire tutorial is devoted to explanation of
bedstat
run splitting (--no-db-commit
and--just-db-commit
), which[x] again, find the first time you refer to software and add link there. An interested reader would have already looked that pu
[x] proposition: in the beginning briefly define what bedfile and especially bedset mean in our system
[x] update to PEP 2.0
[x] fix genomic distributions errors on qthist for fixed width files (?)