databio / bedhost

API and UI for BEDbase
http://api.bedbase.org
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

tutorial feedback #27

Closed stolarczyk closed 3 years ago

stolarczyk commented 4 years ago

I was able to successfully run the tutorial 🎉

FYI, I wanted to use the new PEP and pipeline interface formats, so I cloned the dev/cfg2 branches of our pipelines and looper. Additionally I used GenomicDistributions@dev to test the plots with recent updates.

With this software configuration I ended up with 10 bedfiles in Elasticsearch, so 5 samples failed. However, I think only GenomicDistributions discrepancy is actually relevant here, since all the submission scripts were produced successfully.

Link to my $BBTUTORIAL/outputs/bedstat_output/bedstat_pipeline_logs/looper_logs.txt


Here's some feedback:

joseverdezoto commented 4 years ago

Glad the tutorial is running well! I just glanced over the looper log file. It looks like bedstat can't completely process those bed files because of an issue with GenomicDistributions, more specifically the plotQThist function.


Error in cut.default(dists, divisions, labels) : 'breaks' are not unique
Calls: doItAall ... grid.draw -> plotQTHist -> cutDists -> cut -> cut.default
In addition: Warning message:
Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2. 
Execution halted ```
nsheff commented 4 years ago

probably there's not a wide enough distribution so it's duplicating breaks. that's a bug in GD, should create an issue.

stolarczyk commented 4 years ago

@joseverdezoto in the looper run command you added -R option.looper run has no -R option defined so it does nothing. Perhaps you wanted to pass the argument to the pipeline. The argument passing strategy has changed in looper v1.2.0.

I'll make the change in the code. Just wanted to point that out for future reference. See http://looper.databio.org/en/latest/parameterizing-pipelines/

joseverdezoto commented 4 years ago

I added that flag because I came across a warning that the pipeline wasn't properly shut down. That message suggested to run looper in -R mode. I'll keep that in mind.

stolarczyk commented 4 years ago

do you still have that log somewhere?

joseverdezoto commented 4 years ago

I don't think I do. I removed the entire tutorial produced folder when I ran it again. I'll let you know if I come across that warning again.

stolarczyk commented 4 years ago

I presume that the message you're referring to comes from pypiper:

https://github.com/databio/pypiper/blob/67908f2ee5f51fa5fdddb67eb6d7891aefeeda6a/pypiper/manager.py#L1099-L1103

it suggests to run the pipeline in recover mode, not looper. So using looper run --command-extra="-R" is the way to go