Closed nrosewick closed 2 years ago
Thanks for the interest in this software.
On first error, it seems that there is some missing data in genotype or phenotype (which is not properly handled currently). I will work on fixing that.
Also when will the .lanc file will be used in the association as we do not provide it to the function ?
.lanc file will always be read if there is a
I will make it clear in the document.
So should I first filter the plink file to remove variants with missing genotype (--geno 0 option in plink). This will remove a significant fraction of the variants in my dataset though.
Thank you
Also question related to the .lanc file. How will the tool know where the .lanc is ? Maybe adding a --lanc parameter in the assoc-quant tool is a good idea in order to avoid ambiguity.
Thank you for the great work
Yes i agree you shouldn't use --geno 0. I will implement the genotype imputation within the software, by imputing the missing genotype with average for each SNP. I'll try to work on this early next week.
Currently, i assumed the file structure is like: for every genotype file, there are four or more associated files
So local ancestry file must be associated with a [pfile]. This makes sure that the .lanc (which corresponds to a matrix) always match a corresponding genotype file. I believe this makes the files more structured, rather than assuming that users would know the correspondence. What do you think?
Is it not possible to perform regression only on the samples without missing genotype ? In general plink dataset are preprocessed to filter out SNP with more than 5 or 10% of samples with missing GT. Imputation may induce errors as GT can be very ancestry dependent.
Nicolas
Le jeu. 2 déc. 2021 à 18:10, Kangcheng Hou @.***> a écrit :
Yes i agree you shouldn't use --geno 0. I will implement the genotype imputation within the software, by imputing the missing genotype with average for each SNP. I'll try to work on this early next week.
Currently, i assumed the file structure is like: for every genotype file, there are four or more associated files
- [pfile].pgen
- [pfile].pfam
- [pfile].pvar
- [pfile].lanc
So local ancestry file must be associated with a [pfile]. This makes sure that the .lanc (which corresponds to a matrix) always match a corresponding genotype file. I believe this makes the files more structured, rather than assuming that users would know the correspondence. What do you think?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/KangchengHou/admix-kit/issues/13#issuecomment-984826033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3PHXVCNBP7DOUSLCCV26DUO6R7DANCNFSM5JCDFEEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Dear Nicolas,
Thanks for your suggestion. I have updated the assoc-quant to cope with potential missing genotype and phenotype (if there are missing genotype, the corresponding individuals will be dropped).
Since I updated the code a bit, do you mind following the install and Quick start (command line interface) process (i also add the toy covariate file) again and let me know how it works on your data? Thanks!
Kangcheng
Thank you for the fast update of your work.
I tried the new code on the toy dataset and I've an error at the assoc-quant step :
Any idea where I messed up ? Did I miss some package maybe ?
Traceback (most recent call last):
File "../bin/admix", line 440, in <module>
fire.Fire()
File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/nicolas/anaconda3/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "../bin/admix", line 425, in assoc_quant
dict_rls[m] = admix.assoc.marginal(
File "/home/nicolas/Documents/erasme/projects/HECAM/admix-kit/admix/assoc/__init__.py", line 234, in marginal
var[:, 0::3] = allele_per_anc[:, :, 0]
File "/home/nicolas/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 1657, in __setitem__
raise NotImplementedError(
NotImplementedError: Item assignment with <class 'tuple'> not supported
Thanks for your patience. It should be the dask version problem. Could you run pip install -U "dask[array]"
and try again? I also updated the requirements.txt file
Thank you. admix-kit is now working well :-)
I will process my whole dataset and keep you inform if I have some questions.
Thank you again for the great work :+1:
Thanks for the interest in this software.
On first error, it seems that there is some missing data in genotype or phenotype (which is not properly handled currently). I will work on fixing that.
Also when will the .lanc file will be used in the association as we do not provide it to the function ?
.lanc file will always be read if there is a .lanc, however will only be used when using association testing methods that require local ancestries, such as SNP1, and Tractor.
I will make it clear in the document.
- [x] missing data in genotype or phenotype in association testing
- [ ] add binary trait association, detect if case-control phenotype is detected
- [ ] document use of .lanc in association testing
Hello, My name is Thiago Peixoto Leal and I work with Parkinson Disease in Latin American populations. I would like to know if you have a forecast of when the package will have its version for binary traits. Thank you and congratulations for the good work.
Dear Thiago,
The binary trait association has been implemented, however not well tested, and i should be able to have an update in a month. I would recommend two possible routes for now:
admix assoc
with quantative traits mode, this will give you valid results, as in large sample size, using either logistic regression or linear regression will roughly give you similar results.plink --glm
https://www.cog-genomics.org/plink/2.0/assoc (which is also super fast)Dear Thiago,
The binary trait association has been implemented, however not well tested, and i should be able to have an update in a month. I would recommend two possible routes for now:
- Use the current
admix assoc
with quantative traits mode, this will give you valid results, as in large sample size, using either logistic regression or linear regression will roughly give you similar results.- As our paper https://www.nature.com/articles/s41588-021-00953-5 show that ATT, or methods do not model ancestry-specific effects are likely to be most powerful. You can perform association testing with
plink --glm
https://www.cog-genomics.org/plink/2.0/assoc (which is also super fast)
Hello,
First of all, thank you very much for the fast answer.
I am interest on Ancestry Based methods because I work with a 3-way admixed population. In our previous work we used AM (https://onlinelibrary.wiley.com/doi/full/10.1002/ana.26153) and found new regions, but we did not replicate because we do not have good replication cohorts for AM. I also implemented a TRACTOR (using Python and R) and the results was really better than a regular GWAS (new regions, better p-values, etc).
Unfortunately our N is relatively small (1.5 k individuals) so I will probably wait for the implementation of Logistic methods on Admixt-kit. Just a suggestion: put the pheno and covar files in the test folder because there is none of these files there.
Again thank you very much.
Best,
Thiago Peixoto Leal
Your results are indeed very interesting. And thanks for your suggestion. I will follow up soon on this.
Hi @ThiagoPL
Please clone the latest code and reinstall as follows:
git clone https://github.com/KangchengHou/admix-kit
cd admix-kit
pip install -r requirements.txt
pip install -U git+https://github.com/bogdanlab/tinygwas.git#egg=tinygwas
pip install -e .
for a relatively stable version of code (including logistic regression). And see https://kangchenghou.github.io/admix-kit/cli.html#association-testing for details of the parameters.
Feel free to let me know if you have questions or suggestions. I am happy to add more features to make the software easy and robust to use.
Hi @ThiagoPL
Please clone the latest code and reinstall as follows:
git clone https://github.com/KangchengHou/admix-kit cd admix-kit pip install -r requirements.txt pip install -U git+https://github.com/bogdanlab/tinygwas.git#egg=tinygwas pip install -e .
for a relatively stable version of code (including logistic regression). And see https://kangchenghou.github.io/admix-kit/cli.html#association-testing for details of the parameters.
Feel free to let me know if you have questions or suggestions. I am happy to add more features to make the software easy and robust to use.
Hello @KangchengHou .
Thank you very much for the help. I will test here and I text you with the feedback.
I follow the different steps in order to perform association analysis
I have a .lanc file for mydataset.chr1 generated from admix lanc
admix assoc-quant --pfile data/mydataset.chr1 --pheno samples_pheno.txt --pheno-col PHENO --method ATT --out test.assoc
samples_pheno.txt has the same samples order and has two columns. col1 (indiv) contains sample id and col2 (PHENO) contains phenotype (2 : cases ; 1 : controls )
An error occured :
Also when will the .lanc file will be used in the association as we do not provide it to the function ?
Thank you