Closed tsaojack1234 closed 3 months ago
A few suggestions to get started:
1) The samplesheet says the VCF only contains chromosome 1, but it contains multiple chromosomes. If your target genomes contain multiple chromosomes the chrom column should be empty.
2) Your VCF has low variant density and not many samples. The calculator works best with imputed cohort data.
3) Try nextflow run pgscatalog/pgsc_calc -r main -latest ...
to use the main branch
Hello, thank you for your answer I followed your suggestions and did the following steps: 1. Retain chr1 sites. 2. Put it into Michigan Imputation Server and obtain "chr1.dose.vcf.gz". 3. Use plink2 to change "chr1.dose.vcf.gz" into "chr1.dose_axy.pgen", "chr1.dose_axy.psam", and "chr1.dose_axy.pvar".
plink2 --vcf chr1.dose.vcf.gz \
--allow-extra-chr \
--chr 1 \
-make-pgen \
--out chr1.dose_axy
4. Finally put it into the main program.
nextflow run pgscatalog/pgsc_calc \
-r main -latest \
-profile conda \
--input chr1.dose.csv \
--scorefile PGS000137_hmPOS_GRCh38.txt \
--pgs_id PGS000137 \
--target_build GRCh38
This is the context of chr1.dose.csv: sampleset,path_prefix,chrom,format cineca,chr1.dose_axy,1,pfile
But I got the error message, like this:
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730executor > local (3)
[45/dfedfb] PGS…s_id:PGS000137, pgp_id:, trait_efo:]) | 1 of 1 ✔
[b4/2a899b] PGS…LC:INPUT_CHECK:COMBINE_SCOREFILES (1) | 1 of 1 ✔
[- ] PGS…ALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM -
[skipped ] PGS…NK2_RELABELPVAR (cineca chromosome 1) | 1 of 1, stored: 1 ✔
[- ] PGS…C:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF -
[1d/8e0adf] PGS…:MATCH_VARIANTS (cineca chromosome 1) | 1 of 1, failed: 1 ✘
[- ] PGS…PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE -
[- ] PGS…ALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE -
[- ] PGS…:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE -
[- ] PGS…PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT -
[- ] PGS…GSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (cineca chromosome 1)'
Caused by:
Process `PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (cineca chromosome 1)` terminated with an error exit status (1)
Command executed:
export POLARS_MAX_THREADS=2
pgscatalog-match --dataset cineca --scorefile scorefiles.txt.gz --target GRCh38_cineca_1.pvar.zst --only_match --chrom 1 --outdir $PWD -v
cat <<-END_VERSIONS > versions.yml
MATCH_VARIANTS:
pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 WARNING No output format specified, writing to combined scoring file
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 DEBUG Verbose logging enabled
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 INFO --cleanup set (default), temporary files will be deleted
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column effect_allele
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column other_allele
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Filtering scoring file to chromosome 1
pgscatalog.match.lib.variantframe: 2024-07-10 08:48:29 DEBUG Converting VariantFrame(path='GRCh38_cineca_1.pvar.zst', dataset='cineca', chrom='1', cleanup=True, tmpdir=PosixPath('tmp')) to feather format
Command error:
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 WARNING No output format specified, writing to combined scoring file
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 DEBUG Verbose logging enabled
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 INFO --cleanup set (default), temporary files will be deleted
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column effect_allele
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column other_allele
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Filtering scoring file to chromosome 1
pgscatalog.match.lib.variantframe: 2024-07-10 08:48:29 DEBUG Converting VariantFrame(path='GRCh38_cineca_1.pvar.zst', dataset='cineca', chrom='1', cleanup=True, tmpdir=PosixPath('tmp')) to feather format
Command error:
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 WARNING No output format specified, writing to combined scoring file
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 DEBUG Verbose logging enabled
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 INFO --cleanup set (default), temporary files will be deleted
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column effect_allele
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column other_allele
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Filtering scoring file to chromosome 1
pgscatalog.match.lib.variantframe: 2024-07-10 08:48:29 DEBUG Converting VariantFrame(path='GRCh38_cineca_1.pvar.zst', dataset='cineca', chrom='1', cleanup=True, tmpdir=PosixPath('tmp')) to feather format
......
Please let me know if I've missed anything, thanks.
You probably shouldn't be using a single chromosome to calculate a PGS. PGS000137 contains variants from many chromosomes, so it will cause match errors.
The full logs wTould be helpful to understand more. The logs are stored in the working directory of the process that's failing (work/1d/8e0adf.../.command.err
)
This is file ".command.err":
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 WARNING No output format specified, wr
iting to combined scoring file
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 DEBUG Verbose logging enabled
pgscatalog.match.cli.match_cli: 2024-07-10 08:48:29 INFO --cleanup set (default), tempo
rary files will be deleted
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Converting ScoringFileF
rame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG ScoringFileFrame(Normal
isedScoringFile('scorefiles.txt.gz')) feather conversion complete
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column e
ffect_allele
pgscatalog.match.lib._match.preprocess: 2024-07-10 08:48:29 DEBUG Complementing column o
ther_allele
pgscatalog.match.lib.scoringfileframe: 2024-07-10 08:48:29 DEBUG Filtering scoring file
to chromosome 1
pgscatalog.match.lib.variantframe: 2024-07-10 08:48:29 DEBUG Converting VariantFrame(pat
h='GRCh38_cineca_1.pvar.zst', dataset='cineca', chrom='1', cleanup=True, tmpdir=PosixPath('
/home/yuliangtsao/ext_hdd2/prs/work/1d/8e0adfe7a6e6aa8622c62d0585853a/tmp')) to feather for
mat
Traceback (most recent call last):
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/bin/pgscatalog-match", line 10, in <module>
sys.exit(run_match())
^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_m
atch
ipc_path = get_match_candidates(
^^^^^^^^^^^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3c8b46423/lib/python3.12/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
with variants as target_df:
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in __
enter__
self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/functools.py", line 909, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_r
ead
batches = reader.next_batches(batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuliangtsao/ext_hdd2/prs/work/conda/pgscatalog-utils-cc52ffcd2b21fb989b3730d3
c8b46423/lib/python3.12/site-packages/polars/io/csv/batched_reader.py", line 134, in next_b
atches
batches = self._reader.next_batches(n)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: found more fields than defined in 'Schema'
Consider setting 'truncate_ragged_lines=True'.
thank you.
Could you try again with the latest release please:
$ rm -r work/ # delete any existing caches
$ nextflow run pgscatalog/pgsc_calc -r v2.0.0-beta.1 ...
Hello, I would like to ask some questions and thank you for the tool.
This is my error description:
This is my command line:
I tried "v2.0.0-alpha.6", "v2.0.0-alpha.6", and "v2.0.0-beta", but none of them worked.
input_file: SRR515199_PGS000025.sorted.vcf.gz SRR515199_PGS000025_vcf.sorted.csv
environment: nextflow version 24.04.2.5914 Ubuntu 18.04
Best regards