NYU-Molecular-Pathology / NGS580-nf

Target exome sequencing analysis for NYU NGS580 gene panel
GNU General Public License v3.0
10 stars 6 forks source link

need to refactor reference file staging methods #10

Closed stevekm closed 5 years ago

stevekm commented 5 years ago

In conjunction with #9 need to consider alternative staging methods for reference files, especially in cases where a path to an entire directory is passed such as for ANNOVAR databases and some genome.fa files. Files should be staged directly, instead of staging the entire directory or passing just dir path. Stage-in modes such as 'copy' could be combined with 'scratchDir' for a potential speedup for processing files directly out of HPC node NVMe SSD space with reduced GPFS overhead. Also consider things like stage-in via RAM disk especially for items like ANNOVAR databases, not sure if this will be feasible due to the extremely large storage requirements for them. Might need to split ANNOVAR annotation into separate processes for each database and stage each db individually then recombine.

stevekm commented 5 years ago

potential issues with using SLURM --tmp and Nexflow scratchDir; cannot guarantee that enough space is available if other system users are using tmp space without SLURM accounting.

stevekm commented 5 years ago

not sure its worth making this change