PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Disable pdf creation/running R scripts at preprocess step #638

Open pruzanov opened 1 year ago

pruzanov commented 1 year ago

It appears that during preprocessing, GRIDSS creates a pdf report with an R script. We find that sometimes this hangs silently (for very long time) and the entire step fails. Is it possible to disable this reporting? I found that the problem arises with inputs with a lot (>1M) lines. This metric reports very long inset sizes and it may be practical to introduce a hard stip, for example after 5-10K.

Adding this lines after loading histogram in picard/analysis/insertSizeHistogram.R also improves the situation without altering plot significantly:

 # Sub-sample metric data. This will prevent long run times with large inputs
 histogram_rows<-sample(1:nrow(histogram), min(nrow(histogram), 10000), replace = FALSE)
 histogram = histogram[histogram_rows,]
 ORDERED<-order(histogram$insert_size)
 histogram<-histogram[ORDERED,]