genetics-of-dna-methylation-consortium / godmc_phase2

This repository contains the code to run the analysis pipeline for phase 2 of goDMC released June 2024.
GNU General Public License v3.0
2 stars 1 forks source link

[Bug]: 14-nc886_gwas.sh; Error: Duplicate ID when using family data #69

Open ks164 opened 1 week ago

ks164 commented 1 week ago

Contact Details

ks164@duke.edu

Scripts

14-nc886_gwas.sh

What happened?

Script halts when loading genetic data from related individuals (MZ and DZ twins) with 'Duplicate ID' error.

The listed duplicate ID is one of my FIDs NOT an IID- most of my FIDs are duplicated due to data being from twins.

could this be due to the use of the --no-fid option?

the nc886_frequency.txt, nc886_groups.txt and nc886_scatter.jpeg files were all generated as expected.

How can the bug be reproduced?

No response

R version

4.3.3 (February, 2024)

Python version

None

Relevant log output

GoDMC2 version 1.0.0
Commit: adb482ef5c9153c2ff42e362b9f8cdfc992396c1
Commit date: 2024-10-21 17:55:15 +0100
Current time: Tue Oct 22 09:42:09 AM EDT 2024

Please ensure your scripts are up to date.
If in doubt, run 'git pull'

nc886 and clustering
Loading required package: matrixStats
Loading required package: janitor

Attaching package: ‘janitor’

The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test

Warning message:
package ‘janitor’ was built under R version 4.4.0 
Loading required package: ggplot2
Saving 7 x 7 in image
GWAS intermediated vs non-methylated
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/intermediate_non.log.
Options in effect:
  --bfile /hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data
  --ci 0.95
  --freq
  --geno-counts
  --glm allow-no-covars
  --hardy
  --maf 0.01
  --no-fid
  --out /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/intermediate_non
  --pheno /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/nc886_groups.txt
  --pheno-name intermediate_non

Start time: Tue Oct 22 09:43:09 2024
611824 MiB RAM detected; reserving 305912 MiB for main workspace.
Using 1 compute thread.
1502 samples (0 females, 0 males, 1502 ambiguous; 0 founders) loaded from
/hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data.fam.
7159238 variants loaded from
/hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data.bim.
Error: Duplicate ID '0 10009'.
End time: Tue Oct 22 09:43:12 2024