Script halts when loading genetic data from related individuals (MZ and DZ twins) with 'Duplicate ID' error.
The listed duplicate ID is one of my FIDs NOT an IID- most of my FIDs are duplicated due to data being from twins.
could this be due to the use of the --no-fid option?
the nc886_frequency.txt, nc886_groups.txt and nc886_scatter.jpeg files were all generated as expected.
How can the bug be reproduced?
No response
R version
4.3.3 (February, 2024)
Python version
None
Relevant log output
GoDMC2 version 1.0.0
Commit: adb482ef5c9153c2ff42e362b9f8cdfc992396c1
Commit date: 2024-10-21 17:55:15 +0100
Current time: Tue Oct 22 09:42:09 AM EDT 2024
Please ensure your scripts are up to date.
If in doubt, run 'git pull'
nc886 and clustering
Loading required package: matrixStats
Loading required package: janitor
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
Warning message:
package ‘janitor’ was built under R version 4.4.0
Loading required package: ggplot2
Saving 7 x 7 in image
GWAS intermediated vs non-methylated
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/intermediate_non.log.
Options in effect:
--bfile /hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data
--ci 0.95
--freq
--geno-counts
--glm allow-no-covars
--hardy
--maf 0.01
--no-fid
--out /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/intermediate_non
--pheno /hpc/group/sugdenlab/godmc_phase2/ERisk/results/14/nc886_groups.txt
--pheno-name intermediate_non
Start time: Tue Oct 22 09:43:09 2024
611824 MiB RAM detected; reserving 305912 MiB for main workspace.
Using 1 compute thread.
1502 samples (0 females, 0 males, 1502 ambiguous; 0 founders) loaded from
/hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data.fam.
7159238 variants loaded from
/hpc/group/sugdenlab/godmc_phase2/ERisk/processed_data/genetic_data/data.bim.
Error: Duplicate ID '0 10009'.
End time: Tue Oct 22 09:43:12 2024
Contact Details
ks164@duke.edu
Scripts
14-nc886_gwas.sh
What happened?
Script halts when loading genetic data from related individuals (MZ and DZ twins) with 'Duplicate ID' error.
The listed duplicate ID is one of my FIDs NOT an IID- most of my FIDs are duplicated due to data being from twins.
could this be due to the use of the --no-fid option?
the nc886_frequency.txt, nc886_groups.txt and nc886_scatter.jpeg files were all generated as expected.
How can the bug be reproduced?
No response
R version
4.3.3 (February, 2024)
Python version
None
Relevant log output