KangchengHou / admix-kit

Toolkit for analyzing genetics data from admixed populations
https://kangchenghou.github.io/admix-kit
22 stars 5 forks source link

ValueError: Lengths must match to compare #12

Closed nrosewick closed 2 years ago

nrosewick commented 2 years ago

Hello,

I try to use admix-kit with my dataset. I first download the plink2 fril from 1000G on plink2 website https://www.cog-genomics.org/plink/2.0/resources

Then I decompress and process them the same way you described in the toy.sh script

 here for chr 1
plink2 --pfile chr1_phase3  --rm-dup exclude-all --max-alleles 2 --maf 0.01 --snps-only --seed 0 --make-pgen --out chr1_phase3.admix

Then I executed admin on it :

admix lanc --pfile mydataset.chr1.QCed --ref-pfile chr1_phase3.admix --ref-pop-col "SuperPop" --ref-pops "EUR,AFR,AMR,EAS,SAS" --out test.lanc

But I got an error :

Traceback (most recent call last):
  File "../bin/admix", line 392, in <module>
    fire.Fire()
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "../bin/admix", line 22, in lanc
    assert np.all(sample_dset.snp.index == ref_dset.snp.index), (
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/pandas/core/ops/common.py", line 65, in new_method
    return method(self, other)
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/pandas/core/arraylike.py", line 29, in __eq__
    return self._cmp_method(other, operator.eq)
  File "/home/nicolas/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5615, in _cmp_method
    raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

Any idea how to solve this ?

Toy dataset works well btw.

Thanks

nrosewick commented 2 years ago

I solved it by making sure the variant list in my dataset and reference are the same.

KangchengHou commented 2 years ago

Indeed it is because the variant list are inconsistent between the two data sets.

I plan to work on coping the scenario where there can be mismatch between the variant lists by first taking the intersection of the two data sets.

Also, I want to note that unless one is sure that the data set is indeed a five-way admixture, one should specify a more constrained set of ref-pops. I'll add this to the document.

Thanks for your interest!

I am reopening this for the following issues:

KangchengHou commented 2 years ago

when there is mismatch of variant list, we recommend using RFmix which handles this within the software. See details at https://kangchenghou.github.io/admix-kit/rfmix.html