aryeelab / hichipper

A preprocessing and QC pipeline for HiChIP data
MIT License
33 stars 12 forks source link

R dependencies are hard to figure out #43

Open mpschr opened 6 years ago

mpschr commented 6 years ago

Hi

I run into the problem, that the qc_report fails because of missing R libraries. It would be great if you would write all the dependencies in the README. As is, a user must figure out, he can select the '--keep-temp-files' so that the R scripts are deleted to then inspect what libraries are required.

For other users: My R installation was missing readr, DT and networkD3

Best Michael

caleblareau commented 6 years ago

Hi Michael,

We did our best to document these here:

http://hichipper.readthedocs.io/en/latest/content/Dependencies.html

We will make it more explicit in future versions. Thanks for the feedback!

On Jun 14, 2018, at 4:35 AM, Michael P Schroeder notifications@github.com wrote:

Hi

I run into the problem, that the qc_report fails because of missing R libraries. It would be great if you would write all the dependencies in the README. As is, a user must figure out, he can select the '--keep-temp-files' so that the R scripts are deleted to then inspect what libraries are required.

For other users: My R installation was missing readr, DT and networkD3

Best Michael

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mpschr commented 6 years ago

Oh, I have read over that ! Thanks for the pointer.

Although I see that I am at fault here, I agree that it could be a bit more explicit.

Best Michael

mpschr commented 6 years ago

Hi, I have another question regarding pandoc, as hichippier is still failing on my machine. Which is 'reasonably recent'?

I get an error that looks as follows:

  |......................................                           |  58%
  ordinary text without R code

  |.........................................                        |  63%
label: unnamed-chunk-6 (with options)
List of 6
 $ echo     : logi FALSE
 $ message  : logi FALSE
 $ warning  : logi FALSE
 $ results  : chr "asis"
 $ out.width: chr "\\textwidth"
 $ fig.width: num 7

Warning: 68 parsing failures.
row # A tibble: 5 x 5 col     row col   expected   actual file                                       expected   <int> <chr> <chr>      <chr>  <chr>                                      actual 1 62821 X1    an integer MT     '/home/mpschr/Documents/projects/patys/pl… file 2 62821 X4    an integer MT     '/home/mpschr/Documents/projects/patys/pl… row 3 62822 X1    an integer MT     '/home/mpschr/Documents/projects/patys/pl… col 4 62822 X4    an integer MT     '/home/mpschr/Documents/projects/patys/pl… expected 5 62823 X1    an integer MT     '/home/mpschr/Documents/projects/patys/pl…
... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... ............................. [... truncated]
Quitting from lines 166-189 (qcReport_make.Rmd)
Error in { : task 1 failed - "invalid 'times' argument"
Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> %do% -> <Anonymous>
In addition: Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)

Execution halted
Processing: SAMPLE4  
Error in { : task 1 failed - "missing value where TRUE/FALSE needed"
Calls: %do% -> <Anonymous>
Execution halted

My pandoc version:

pandoc 1.19.1 Compiled with pandoc-types 1.17.0.4, texmath 0.9, highlighting-kate 0.6.3

Maybe it is not even related to pandoc..

caleblareau commented 6 years ago

My initial guess would be that a sample as processed by the pipeline is somehow missing values needed for the QC report.

Can you run with —keep-temp-files and then send an ls -lrt of the output directory?

On Jun 15, 2018, at 3:03 AM, Michael P Schroeder notifications@github.com wrote:

Hi, I have another question regarding pandoc, as hichippier is still failing on my machine. Which is 'reasonably recent'?

I get an error that looks as follows:

|...................................... | 58% ordinary text without R code

|......................................... | 63% label: unnamed-chunk-6 (with options) List of 6 $ echo : logi FALSE $ message : logi FALSE $ warning : logi FALSE $ results : chr "asis" $ out.width: chr "\textwidth" $ fig.width: num 7

Warning: 68 parsing failures. row # A tibble: 5 x 5 col row col expected actual file expected actual 1 62821 X1 an integer MT '/home/mpschr/Documents/projects/patys/pl… file 2 62821 X4 an integer MT '/home/mpschr/Documents/projects/patys/pl… row 3 62822 X1 an integer MT '/home/mpschr/Documents/projects/patys/pl… col 4 62822 X4 an integer MT '/home/mpschr/Documents/projects/patys/pl… expected 5 62823 X1 an integer MT '/home/mpschr/Documents/projects/patys/pl… ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... ............................. [... truncated] Quitting from lines 166-189 (qcReport_make.Rmd) Error in { : task 1 failed - "invalid 'times' argument" Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> %do% -> In addition: Warning message: In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)

Execution halted Processing: SAMPLE4
Error in { : task 1 failed - "missing value where TRUE/FALSE needed" Calls: %do% -> Execution halted My pandoc version:

pandoc 1.19.1 Compiled with pandoc-types 1.17.0.4, texmath 0.9, highlighting-kate 0.6.3

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aryeelab/hichipper/issues/43#issuecomment-397532504, or mute the thread https://github.com/notifications/unsubscribe-auth/APei4cIqI6sAgDpJKuPJ4-9LOU2ouBiaks5t81wqgaJpZM4UniDM.

mpschr commented 6 years ago

This is the output

-rw-rw-r-- 1 mpuser mpuser 5,2M Jun 18 09:24 userSuppliedPeaks.bed.tmp
-rw-rw-r-- 1 mpuser mpuser 1,1M Jun 18 09:24 userSuppliedPeaks.bed.tmp_pad.bed.tmp
-rw-rw-r-- 1 mpuser mpuser 907K Jun 18 09:24 userSuppliedPeaks.bed.tmp_pad.bed.tmprf.tmp
-rw-rw-r-- 1 mpuser mpuser 884K Jun 18 09:26 SAMPLE5_temporary_peaks.merged.bed.tmp
-rw-rw-r-- 1 mpuser mpuser   16 Jun 18 09:31 SAMPLE5.peakReads.tmp
-rw-rw-r-- 1 mpuser mpuser 1,6G Jun 18 09:34 SAMPLE5_interactions.bedpe.tmp
-rw-rw-r-- 1 mpuser mpuser 345M Jun 18 09:36 SAMPLE5_anchor1.bed.tmp
-rw-rw-r-- 1 mpuser mpuser 344M Jun 18 09:38 SAMPLE5_anchor2.bed.tmp
-rw-rw-r-- 1 mpuser mpuser  10M Jun 18 09:38 SAMPLE5_anchor.interactions.bedpe.tmp
-rw-rw-r-- 1 mpuser mpuser  11M Jun 18 09:38 SAMPLE5.loop_counts.bedpe.tmp
-rw-rw-r-- 1 mpuser mpuser 7,5M Jun 18 09:38 SAMPLE5.inter.loop_counts.bedpe
-rw-rw-r-- 1 mpuser mpuser 2,8M Jun 18 09:38 SAMPLE5.intra.loop_counts.bedpe
-rw-rw-r-- 1 mpuser mpuser 1,4M Jun 18 09:38 SAMPLE5.filt.intra.loop_counts.bedpe
-rw-rw-r-- 1 mpuser mpuser  526 Jun 18 09:38 SAMPLE5.stat
-rw-rw-r-- 1 mpuser mpuser 9,9K Jun 18 09:38 qcReport_make.Rmd
-rw-rw-r-- 1 mpuser mpuser  442 Jun 18 09:38 qcReport.R
-rw-rw-r-- 1 mpuser mpuser 2,4K Jun 18 09:39 results_SAMPLE5.hichipper.log
mpschr commented 6 years ago

I think I found the source of the problem: we have MT DNA tags, which is not accepted as an integer when reading with read_delim. Thus explicitly indicating column types avoids this issue.

coltypes = cols(X1='c',X2='i', X3='i', X4='c', X5='i', X6='i',X7='c',X8='i')

x <- suppressMessages(read_delim(sfilename, " ", col_names = FALSE, progress = FALSE, col_types=coltypes))
mpschr commented 6 years ago

as a side note: I cannot find the bit of code that creates the .rds file?

caleblareau commented 6 years ago

Ah, good catch. I would agree that could be the problem. We've had chatter about removing the mitochondrial chromosomes from the restriction fragment file, which may solve this in part. Let me think about it.

The RDS file is generated at the bottom here: https://github.com/aryeelab/hichipper/blob/master/hichipper/diffloop_work.R

mpschr commented 6 years ago

Since now I am familiar with the code, I'd like to point another thing.

francisfa commented 4 years ago

Hi, I don't understand that. How do you fix this error?

> s <- loopsMake(outdir, samples = paste0(sample, ".filt.intra"), mergegap = 0)
> sampleNames(s) <- sample
> mango_filt_df <- summary(mangoCorrection(s))

Error in if (all(idxa)) return(dlo) else return(.subsetLoops(dlo, idxa)) : 
  missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)

And I didn't find the mangoCorrection function:

> mangoCorrection
standardGeneric for "mangoCorrection" defined from package "diffloop"

function (lo, FDR = 1, PValue = 1, nbins = 10) 
standardGeneric("mangoCorrection")

Best, Francis

caleblareau commented 4 years ago

Can you post dim(s) after your first command?

On Jan 18, 2020, at 6:34 AM, FW notifications@github.com wrote:

Hi, I don't understand that. How do you fix this error?

s <- loopsMake(outdir, samples = paste0(sample, ".filt.intra"), mergegap = 0) sampleNames(s) <- sample mango_filt_df <- summary(mangoCorrection(s))

Error in if (all(idxa)) return(dlo) else return(.subsetLoops(dlo, idxa)) : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) And I didn't find the mangoCorrection function:

mangoCorrection standardGeneric for "mangoCorrection" defined from package "diffloop"

function (lo, FDR = 1, PValue = 1, nbins = 10) standardGeneric("mangoCorrection") Best, Francis

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aryeelab/hichipper/issues/43?email_source=notifications&email_token=AD32FYNYGX2NMJMFZ3DAFNDQ6LSKZA5CNFSM4FE6EDGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJJWKGI#issuecomment-575890713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD32FYKTMNIONZYXGPVTC3TQ6LSKZANCNFSM4FE6EDGA.

francisfa commented 4 years ago

Sorry for delay. This dim(s) is:

> dim(s)
  anchors interactions samples colData rowData
1  265153       593171       1       2       1

Can you post dim(s) after your first command? On Jan 18, 2020, at 6:34 AM, FW @.***> wrote: Hi, I don't understand that. How do you fix this error? > s <- loopsMake(outdir, samples = paste0(sample, ".filt.intra"), mergegap = 0) > sampleNames(s) <- sample > mango_filt_df <- summary(mangoCorrection(s)) Error in if (all(idxa)) return(dlo) else return(.subsetLoops(dlo, idxa)) : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) And I didn't find the mangoCorrection function: > mangoCorrection standardGeneric for "mangoCorrection" defined from package "diffloop" function (lo, FDR = 1, PValue = 1, nbins = 10) standardGeneric("mangoCorrection") Best, Francis — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#43?email_source=notifications&email_token=AD32FYNYGX2NMJMFZ3DAFNDQ6LSKZA5CNFSM4FE6EDGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJJWKGI#issuecomment-575890713>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD32FYKTMNIONZYXGPVTC3TQ6LSKZANCNFSM4FE6EDGA.