PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
114 stars 21 forks source link

"AssertionError: Duplicate IDs in final matches" even though there are no duplicate IDs #232

Closed Humere closed 7 months ago

Humere commented 9 months ago

Hi Everyone! For my internship i am comparing the VCF data provided from the companies 23andMe and iGene. pgsc_calc does calculate the polygenic scores for the data provided by iGene (GRCh38) but not for 23andMe (GRCh37). It says that there are duplicate IDs in the final matches even though i filtered those out. I also had this problem with the data from iGene but i managed to solve this problem. I did the same thing for the data from 23andMe and i checked multiple times if there were duplicate IDs present but there are no duplicates. i also tried to change the #CHROM. An example: 1 > chr1 and vice versa. Can anyone provide me with some tips or a solution for this problem? I highly doubt it but is there a possibility it might have to do with the target build (GRCh37)?

I would appreciate your help a lot!

smlmbrt commented 9 months ago

@Humere - what did you do to fix it the first time? What options are you using to run pgsc_calc (and which version)?

smlmbrt commented 7 months ago

Stale request.