PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
106 stars 19 forks source link

run_ancestry failed on test data #271

Closed ashenfernando1 closed 3 months ago

ashenfernando1 commented 4 months ago

Description of the bug

Maybe similar to https://github.com/PGScatalog/pgsc_calc/issues/252, but haven't found a resolution yet.

If you have any thoughts on what may be going wrong, I would really appreciate it.

Relevant files

nextflow.log

System information

Ubuntu 22.04.01 (Linux VM), nextflow version 23.10.1.5891

smlmbrt commented 4 months ago

@ashenfernando1 this is because the test profile (-profile test,docker) doesn't work with the --run_ancestry command. You'll need to supply actual samples with a samplesheet to try it out https://pgsc-calc.readthedocs.io/en/latest/how-to/samplesheet.html

ashenfernando1 commented 4 months ago

thanks for the quick reply @smlmbrt. So I had tried with samples and a samplesheet and ran into a different error, so I wanted to see if there was a --run_ancestry canned example to verify everything was working as intended.

I'll expand a little on the error with the supplied samples and samplesheet, but let me know if it would be more appropriate to create a separate issue/query:

It fails like so:

ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:RELABEL_AFREQ (testtirtya null afreq)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:RELABEL_AFREQ (testtirtya null afreq)` terminated with an error exit status (1)

Command executed:

  relabel_ids --maps testtirtya_ALL_matched.txt.gz         --col_from ID_REF         --col_to ID_TARGET         --target_file GRCh37_reference.afreq.zst         --target_col ID         --dataset testtirtya.afreq         --verbose         --combined

  cat <<-END_VERSIONS > versions.yml
  RELABEL_AFREQ:
      pgscatalog_utils: $(echo $(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)
Command error:
  root: 2024-03-31 17:02:06 DEBUG    Verbose logging enabled
  pgscatalog_utils.relabel.relabel_ids: 2024-03-31 17:02:06 DEBUG    Writing combined output enabled
  pgscatalog_utils.relabel.relabel_ids: 2024-03-31 17:02:06 DEBUG    Reading map file testtirtya_ALL_matched.txt.gz with gzip.open
  pgscatalog_utils.relabel.relabel_ids: 2024-03-31 17:02:27 DEBUG    Opening testtirtya.afreq_ALL_relabelled.gz and writing header
  Traceback (most recent call last):
    File "/venv/bin/relabel_ids", line 8, in <module>
      sys.exit(relabel_ids())
    File "/venv/lib/python3.10/site-packages/pgscatalog_utils/relabel/relabel_ids.py", line 179, in relabel_ids
      [_relabel_target(args=args, mapping=mapping, split_output=x) for x in split_output]
    File "/venv/lib/python3.10/site-packages/pgscatalog_utils/relabel/relabel_ids.py", line 179, in <listcomp>
      [_relabel_target(args=args, mapping=mapping, split_output=x) for x in split_output]
    File "/venv/lib/python3.10/site-packages/pgscatalog_utils/relabel/relabel_ids.py", line 104, in _relabel_target
      _relabel(in_target=io.TextIOWrapper(reader), mapping=mapping, split_output=split_output, args=args)
    File "/venv/lib/python3.10/site-packages/pgscatalog_utils/relabel/relabel_ids.py", line 149, in _relabel
      line[i_target_col] = mapping[line[i_target_col]]  # revalue column
  KeyError: '22:25761309:C:T'

Is this KeyError: '22:25761309:C:T' particularly informative? The samplesheet chrom column is empty because of multiple chromosomes.

smlmbrt commented 4 months ago

I've only seen this happen when it's started the pipeline from a resume. Delete the work directory and the cache and start again.