dvitale199 / GenoTools

GenoTools: Advanced Genotype Data Analysis A robust suite for processing genotype data, offering genotype calling (.idat to PLINK), comprehensive sample/variant QC, and ancestry estimation. Ideal for computational biology and genetics research.
Apache License 2.0
22 stars 7 forks source link

KeyError: 'FID' when merging for labeled_pruned_df #162

Closed zihhuafang closed 5 months ago

zihhuafang commented 7 months ago

Describe the bug Version: 1.0.1 Command used: genotools --bfile amppd_v4/amppd_ancenstry --out amppd_v4/ancenstry --ref_panel run_model/ref_panel --ref_labels run_model/ref_panel_ancestry_updated.txt --geno --hwe 1e-10 --ancestry

Error:

Running: geno with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_ancestry_AAC and output: /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_AAC_geno Running: hwe with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_AAC_geno and output: amppd_v4/ancenstry_AAC Running: geno with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_ancestry_FIN and output: /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_FIN_geno Running: hwe with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_FIN_geno and output: amppd_v4/ancenstry_FIN Running: geno with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_ancestry_MDE and output: /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_MDE_geno Running: hwe with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_MDE_geno and output: amppd_v4/ancenstry_MDE Running: geno with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_ancestry_AFR and output: /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_AFR_geno Running: hwe with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_AFR_geno and output: amppd_v4/ancenstry_AFR Running: geno with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_ancestry_SAS and output: /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_SAS_geno Running: hwe with input /net/beegfs-hpc/work/fangz/GP2/GenoTools/amppd_v4/.gy8guo0c_tmp/ancenstry_SAS_geno and output: amppd_v4/ancenstry_SAS Traceback (most recent call last): File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/bin/genotools", line 8, in sys.exit(handle_main()) ^^^^^^^^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/main.py", line 202, in handle_main labeled_pruned_df = pruned_df.merge(labels[['FID','IID','label']], how='left', on=['FID','IID']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/pandas/core/frame.py", line 10805, in merge return merge( ^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 170, in merge op = _MergeOperation( ^^^^^^^^^^^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 794, in init ) = self._get_merge_keys() ^^^^^^^^^^^^^^^^^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1310, in _get_merge_keys left_keys.append(left._get_label_or_level_values(lk)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/pandas/core/generic.py", line 1910, in _get_label_or_level_values raise KeyError(key) KeyError: 'FID'

Not sure why SAS failed, but there were samples and variants in ancenstry_SAS pfiles. -rw-r--r-- 1 fangz 4.7M Feb 8 21:35 ancenstry_SAS.pvar -rw-r--r-- 1 fangz 418 Feb 8 21:35 ancenstry_SAS.psam -rw-r--r-- 1 fangz 737K Feb 8 21:35 ancenstry_SAS.pgen

Doesn't seem to have a step failed based on the logs. I attached all the logs generated from genotools here. amppd_ancenstry.log ancenstry_all_logs.log ancenstry_cleaned_logs.log