caravagnalab / CNAqc

CNAqc - Copy Number Alteration (CNA) Quality Check package
GNU General Public License v3.0
17 stars 8 forks source link

VROOM_CONNECTION_SIZE setting #21

Closed wt12318 closed 6 months ago

wt12318 commented 1 year ago

Hi,

I use the CNVqc sequenza pipeline with following code :

Sys.setenv("VROOM_CONNECTION_SIZE" = 500072*100)
library(CNAqc)
library(dplyr)
res  <- Sequenza_CNAqc(
    sample_id = "Patient10_Tumor_tissue_Time-2",
    seqz_file = "Patient10_Tumor_tissue_Time-2.small_filter.seqz.gz", # Binned file
    mutations = mut_dt, # If using an external set of mutations
    sex = "M", # If female,
    verbose = TRUE
  )

Error happened:

ℹ Seqz pre-processing [sequenza.extract]
Collecting GC information ........ done

Processing chr1:
   4 variant calls.
   2 copy-number segments.
   1870 heterozygous positions.
   428101 homozygous positions.
Processing chr2:
   3 variant calls.
   2 copy-number segments.
   1291 heterozygous positions.
   310616 homozygous positions.
Processing chr3:
   1 variant calls.
   2 copy-number segments.
   1031 heterozygous positions.
   259370 homozygous positions.
Processing chr4:
Error: The size of the connection buffer (50007200) was not large enough
to fit a complete line:
  * Increase it by setting `Sys.setenv("VROOM_CONNECTION_SIZE")`
  * ✖ Seqz pre-processing [sequenza.extract] ... failed

How big is the appropriate VROOM_CONNECTION_SIZE setting? The files was in https://drive.google.com/file/d/1CCi1FbdtdZrIBu-yID9_GuAAO_jb5TDU/view?usp=sharing, https://drive.google.com/file/d/1HPvlHmwmgDZk1nNkYTqNU__hHBr5eLxj/view?usp=sharing

Thank you.

wt12318 commented 1 year ago

When I set Sys.setenv("VROOM_CONNECTION_SIZE" = 500072*1000), It can run. But another error happened:

Error in `filter()`:
ℹ In argument: `QC == "PASS"`.
Caused by error:
! object 'QC' not found
Run `rlang::last_error()` to see where the error occurred.

The code where the error happened is:

best_run = L_cache %>% filter(QC == "PASS") %>% arrange(abs(score)) %>% 
    slice(1) %>% pull(run)

After debug, I found the reason is that there is no QC column in L_cache image

This time the code I run is (not provide mutations) :

res  <- CNAqc::Sequenza_CNAqc(
    sample_id = "Patient10_Tumor_tissue_Time-2",
    seqz_file = "Patient10_Tumor_tissue_Time-2.small_filter.seqz.gz",
    mutations = NULL,
    sex = "M",
    verbose = TRUE
  )
caravagn commented 1 year ago

Hello @wt12318 , @nicola-calonaci will suggest what is best on this case.

caravagn commented 1 year ago

Any updates on this @nicola-calonaci ?

nicola-calonaci commented 1 year ago

Probably same as issue https://github.com/caravagnalab/CNAqc/issues/23 Please provide the full printed output to confirm that.

caravagn commented 11 months ago

@nicola-calonaci has any progress been made?

caravagn commented 6 months ago

@nicola-calonaci Please fix this or close it.

nicola-calonaci commented 6 months ago

When this happens, you need to gunzip the .seqz files beforehand and provide .seqz files as input to the pipeline.

caravagn commented 6 months ago

OK, so I can close it.