PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
118 stars 21 forks source link

Not being able to submit several scorefiles #127

Closed scienception closed 1 year ago

scienception commented 1 year ago

When I submit the following:

nextflow-23.04.2-all run pgsc_calc/main.nf -profile singularity --input ./samplesheet.csv --scorefile PGS_scores/*.txt.gz --target_build GRCh37 --ref pgsc_calc_ref.sqlar

I would expect for it to work for all scorefiles in PGS_scores, however it only computes the first.

This is the output file, shouldn't scorefile contain all scores in PGS_scores?

------------------------------------------------------
  pgscatalog/pgsc_calc v1.3.2
------------------------------------------------------
Core Nextflow options
  runName        : distracted_pesquet
  containerEngine: singularity
  launchDir      : ./nextflow
  workDir        : ./nextflow/work
  projectDir     : ./pgsc_calc
  profile        : singularity
  configFiles    : ./pgsc_calc/nextflow.config

Input/output options
  input          : ./samplesheet.csv
  scorefile      : ./PGS_scores/PGS001828_hmPOS_GRCh37.txt.gz
  pgs_id         : null
  pgp_id         : null
  trait_efo      : null
  target_build   : GRCh37
  ref            : ./pgsc_calc_ref.sqlar
  genotypes_cache: null

Institutional config options
  hostnames      : [:]

Max job request options
  max_cpus       : 2
  max_memory     : 16.GB

This is what PGS_scores directory looks like:

PGS001828_hmPOS_GRCh37.txt.gz  PGS002050_hmPOS_GRCh37.txt.gz  PGS002095_hmPOS_GRCh37.txt.gz  PGS002159_hmPOS_GRCh37.txt.gz  PGS002191_hmPOS_GRCh37.txt.gz
PGS002013_hmPOS_GRCh37.txt.gz  PGS002053_hmPOS_GRCh37.txt.gz  PGS002099_hmPOS_GRCh37.txt.gz  PGS002160_hmPOS_GRCh37.txt.gz  PGS002193_hmPOS_GRCh37.txt.gz
PGS002015_hmPOS_GRCh37.txt.gz  PGS002054_hmPOS_GRCh37.txt.gz  PGS002101_hmPOS_GRCh37.txt.gz  PGS002161_hmPOS_GRCh37.txt.gz  PGS002194_hmPOS_GRCh37.txt.gz

The results/score/aggregated_scores.txt.gz file also contains only PGS001828 score:

sampleset       IID     DENOM   PGS001828_hmPOS_GRCh37_SUM      PGS001828_hmPOS_GRCh37_AVG
ukbb    1586458 76      2.15695965      0.028381048026315792
ukbb    2764360 76      -0.03492832799999998    -0.00045958326315789446
ukbb    3610699 76      -0.088038938    -0.0011584070789473684
nebfield commented 1 year ago

Try using " around --scorefile parameter:

 --scorefile "PGS_scores/*.txt.gz"

Your shell is probably automatically expanding the wildcard character which causes problems, using quotes prevents this.

scienception commented 1 year ago

That worked, thanks again!