epi2me-labs / wf-somatic-variation

Other
10 stars 5 forks source link

copy of the reference genome with the output files #18

Closed selmapichot closed 2 months ago

selmapichot commented 2 months ago

Ask away!

Hi, I found a copy of the reference genome I use amongst the output files. If I run the process on two samples set, then it is copied twice (one in each sample directory). Is there a way to avoid this please? as reference genomes are quite large.

Many thanks, Selma.

RenzoTale88 commented 2 months ago

Hi @selmapichot this currently not possible, sorry. The workflow emits the reference genome and its indexes because, when needed, it generates a CRAM alignment file, that requires the reference genome to be accessed correctly. I'd suggest to remove these manually at the end of the run

selmapichot commented 2 months ago

Many thanks for your reply. Is it ok to have a shared cache for different samples ? or does it have to be specific for each sample ?

RenzoTale88 commented 2 months ago

When you say cache, wou mean a work directory? If so, I'd suggest to run the analysis in independent work directories to ensure that the two runs do not interfere with each other. You can, for instance, save the work directory within the output directory with

--out_dir output_dir -w output_dir/work
selmapichot commented 2 months ago

Great thank you I will try that. All the best, Selma.