Open maernster opened 5 years ago
convertf allows one to change the max chrom number but unluckily there's a bug in that conversion from PLINK doesn't support that. I will fix that real soon but that's no help to you. Perhaps you can recode your chromosome numbers and positions to something convertf will handle and then convert back. I'm sorry to not be more help.
Nick
On Fri, Aug 23, 2019 at 10:16 AM maernster notifications@github.com wrote:
Hi there,
I would like to use convertf to convert from vcf to eigenstrat to test hybridization using ADMIXTOOLS. The problem is, that I am working with data from non-model organisms meaning that my chromosome IDs don't refer to actual chromosomes but to scaffolds.
When I run converf I get an error saying: warning (mapfile): bad chrom: 100
I realized that when I set all chromosomes to 1, sort the variants and remove those that are duplicate, convertf works. But I dont want to loose the actual snp order. Also I'd prefer not to remove duplicate variants...
Is there any way to circumvent this and to be able to generate an eigenstrat file without having to change the chrom IDs?
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54?email_source=notifications&email_token=AEE77B73S5PHDEZA5YYXUGDQF7WMHA5CNFSM4IPADZO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HHBWX6Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE77B53JPJGBYIFO2D72G3QF7WMHANCNFSM4IPADZOQ .
I am having the same issue (using scaffolds not chromosomes) and getting the bad chrom error. Has this been resolved? If no, @maernster did you find a solution?
Thanks, Rebecca
The latest version of admixtools supports blockname: allowing you to flexibly deal with scaffolds. Here's how: 1) map your snps onto chromosomes + positions arbitrarily. Genetic distance 0 is OK. Choose chromosome numbers in range 1-22. 2) Make a file >myblocks> of form snpname1 1 snpname2 1 snpname3 2 .... where the first 2 snps belong to one scaffold the next to a different one. Run admixtools with
blockname: myblocks
Good luck, and make sure you have the latest version (7.0).
Nick
On Mon, Jun 29, 2020 at 12:53 PM Rebecca Stubbs notifications@github.com wrote:
I am having the same issue (using scaffolds not chromosomes) and getting the bad chrom error. Has this been resolved? If no, @maernster https://github.com/maernster did you find a solution?
Thanks, Rebecca
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-651240351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B2NA5CTYJJIQ2W3RMLRZDBIPANCNFSM4IPADZOQ .
Great thanks!
Hi,
I also have a dataset with 4761 scaffolds and I am having trouble converting my plink dataset to eigenstrat format with your convertf utility.
When I do attempt to run convertf I receive this error: bad chrom: NW_020356435.1
I tried renaming the scaffolds to remove non-alphanumerical characters, this only seems to make a difference when I change the scaffold name to chr, but this seems to work when there is a limited number of chromosomes.
I have also tried the method you described above, but I receive the same error: bad chrom: chr4635
This is the format of the .map file: chr4635 chr4635_914 0 914 chr4635 chr4635_1113 0 1113 ...
This is the format of the block file: chr4635_914 1 chr4635_1113 1
I made sure to install the latest version of the software.
Is there any way to be able to generate an eigenstrat file without having to change the chromosome IDs or if this is unavoidable is there an upper limit on the number of chromosomes/scaffolds?
Any advice would be greatly appreciated.
Thanks
Tatiana
1) Map your data onto .ind .snp .geno by arbitrarily mapping your snps to chromosomes 1-22 The chromosome names MUST be small integers.
2) Then use blockname: Both your examples with bad chrom: have alpha characters in chromosome name.
Nick
2)
On Wed, Jul 8, 2020 at 7:22 AM trfeuerborn notifications@github.com wrote:
Hi,
I also have a dataset with 4761 scaffolds and I am having trouble converting my plink dataset to eigenstrat format with your convertf utility.
When I do attempt to run convertf I receive this error: bad chrom: NW_020356435.1
I tried renaming the scaffolds to remove non-alphanumerical characters, this only seems to make a difference when I change the scaffold name to chr, but this seems to work when there is a limited number of chromosomes.
I have also tried the method you described above, but I receive the same error: bad chrom: chr4635
This is the format of the .map file: chr4635 chr4635_914 0 914 chr4635 chr4635_1113 0 1113 ...
This is the format of the block file: chr4635_914 1 chr4635_1113 1
I made sure to install the latest version of the software.
Is there any way to be able to generate an eigenstrat file without having to change the chromosome IDs or if this is unavoidable is there an upper limit on the number of chromosomes/scaffolds?
Any advice would be greatly appreciated.
Thanks
Tatiana
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-655456465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B5PVITFPM5VJWZDNL3R2RJFXANCNFSM4IPADZOQ .
Hello,
I am encountering the same issue as reported here. I have my SNPs located on 260 contigs, and I cannot replace their names by values from 1 to 260, as the maximum seems to be 100. Of course, if I change all the #CHR names by the value "1", then it works, but I don't want to lose the precious information of each SNPs location.
@bumblenick I read your suggestion, but could you please explain a little more in detail what you mean by: "Map your data onto .ind .snp .geno by arbitrarily mapping your snps to chromosomes 1-22"
I have 260 contigs, much more than 22... so is it actually possible to do something?
Thanks for any help :) All the best, Marvin
Here's how to work with contigs. 1) Map your snps to (say) C1 position 1, 2, 3, .... snp name (say) X1 X2 X3 ... Genetic position 0 .
2) In ADMIXTOOLS use blockname:
blockname:
The SNP positions/distances are only used in ADMIXTOOLS to define blocks for the jackknife. blockname: overwrites this.
The recent program qpfstats does not yet support blockname. That's for the next release.
Nick
On Wed, Aug 26, 2020 at 10:59 AM Marvin02860 notifications@github.com wrote:
Hello,
I am encountering the same issue as reported here. I have my SNPs located on 260 contigs, and I cannot replace their names by values from 1 to 260, as the maximum seems to be 100. Of course, if I change all the #CHR names by the value "1", then it works, but I don't want to lose the precious information of each SNPs location.
@bumblenick https://github.com/bumblenick I read your suggestion, but could you please explain a little more in detail what you mean by: "Map your data onto .ind .snp .geno by arbitrarily mapping your snps to chromosomes 1-22"
I have 260 contigs, much more than 22... so is it actually possible to do something?
Thanks for any help :) All the best, Marvin
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-680934309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B2ROKRXFZEX3K6XMILSCUPNJANCNFSM4IPADZOQ .
Thank you very much for your quick reply, very appreciated :)
Do you think the option block name can work with admixR?
I have tried to make Admixtools work (via admixR), as follows:
result <- d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, blockname = "my_contigs.txt") but not working...:
"Error in d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, : unused argument (blockname = "my_contigs.txt")"
Thanks again! All the best, Marvin
I think you will have to contact the authors of admixr. ADMIXTOOLS is not a static package and other implementations will need updates from time to time to obtain all the features.
Nick
On Fri, Aug 28, 2020 at 1:59 PM Marvin02860 notifications@github.com wrote:
Thank you very much for your quick reply, very appreciated :)
Do you think the option block name can work with admixR?
I have tried to make Admixtools work (via admixR), as follows:
result <- d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, blockname = "my_contigs.txt") but not working...:
"Error in d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, : unused argument (blockname = "my_contigs.txt")"
Thanks again! All the best, Marvin
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-682997046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77BZZ7OLR7SV4BMW2VGTSC7WBNANCNFSM4IPADZOQ .
Hello,
I just wanted to comment here on the solution to the problem mentioned, since I managed to have everything up and running finally:
My initial problem was to run ADMIXTOOLS (preferentially via AdmixR) with non-human data, a few thousands SNPs located on 260 contigs. ADMIXTOOLS does not recognise non-standard (non-human) chromosome names (such as contigs / scaffolds of other sequences, or even integers>100). In my case, my CHR names look like "ctgxxxxxxxxxxxxxxxxx".
Starting with a VCF file containing my SNPs, I used the script: https://raw.githubusercontent.com/mathii/gdc/master/vcf2eigenstrat.py to convert my VCF to eigenstrat format required by ADMIXTOOLS (resulting in the 3 files with format: .ind / .geno / .snp).
In the .snp file, my non-standard chromosome names are problematic for running ADMIXTOOLS. I need to modify the first column (SNP ID) and the second column (CHR ID) of that file as follows: First column (SNP ID): replace SNP ID by integers from 1 to 2391 (=my total number of SNPs). Second column (CHR ID): replace CHR names (or contig/scaffold names) by integers from 1 to 22 (arbitrarily).
In order to keep track of the SNP positions in the analyses though, which is necessary for the jackknife process of defining blocks, I need to make another file defining the blocks (= which SNP belongs to which CHR). This info will be important to allow calculation of Z_score (statistical significance). The file can be called: "my_contigs.txt" and looks like: 1st column = list of SNP ID as integers from 1 to 2391 2nd column = contig /scaffold names corresponding to where the SNPs are actually located, but these names cannot be like the original complicated "ctgxxxxxxxxxxxxx". Instead they need to be integers. In this case, it will be 1 to 260.
I can now run ADMIXTOOLS with AdmixR using the option _<params = list(blockname = "mycontigs.txt")>
For instance, in R: _D_stat_1 <- d(W = popsCFsymp, X = "CF_scot", Y = popsCGsymp, Z = popsCh, data = snps, params = list(blockname = "mycontigs.txt"))
Thank you again Nick for your precious help. All the best, Marvin
Trying to work through this problem myself whilst trying to convert .ped
to eigenstat format to run smartPCA.
Is there any intention to allow for non-standard chromosome names (like plink allows using --allow-extra-chr
)?
It does limit and complicate the procedure for those who aren't "forntunate" enough to work on model organisms. I am new to bioinformatics and I am finding it difficult to apply these methods to my dataset.
Thank you for the help on this chain - hopefully I will be able to crack it!
I will admit to a design flaw in Admixtools, where chromosomes are required to be small integers (< 99). If rewriting the code I would not do this, but fixing it is a large problem which I don't have the resources to do.
The workaround is typically not so bad though -- just remap your chrormosome names to small integers. Further the SNP positions and gen. distance are only used for the block jackknife for most of the programs you can control the blocks using blockname: This was introduced precisely to help with non-model organisms (and for example allows each block to be an unmapped contig).
If you need a program in Admixtools with blockname: not implemented write to me and I will try and do it in the next release.
Nick Patterson 2/10/21
On Wed, Feb 10, 2021 at 10:19 AM EveTC notifications@github.com wrote:
Trying to work through this problem myself whilst trying to convert .ped to eigenstat format to run smartPCA. Is there any intention to allow for non-standard chromosome names (like plink allows using --allow-extra-chr)? It does limit and complicate the procedure for those who aren't "forntunate" enough to work on model organisms. I am new to bioinformatics and I am finding it difficult to apply these methods to my dataset. Thank you for the help on this chain - hopefully I will be able to crack it!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-776779994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B2T7UGRM2HPQPSCXQDS6KPZVANCNFSM4IPADZOQ .
H Nick, Thank you for your response and help :) Would you mind clarifying how I would remap my chromosomes to new names? Sorry if this is a simple question - very new to this all.
Dear EveTC Please email me direct, this thread should be taken off github
On Wed, Feb 10, 2021 at 11:03 AM EveTC notifications@github.com wrote:
H Nick, Thank you for your response and help :) Would you mind clarifying how I would remap my chromosomes to new names? Sorry if this is a simple question - very new to this all.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-776816306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B56B6BORZBS423F5ADS6KU4ZANCNFSM4IPADZOQ .
Will do - thank you Nick
Dear Nick and EveTC,
I would also be interested in running smartpca on whole genome resequencing data with a reference genome composed of 56 scaffolds. I can see smartpca only uses the SNPs on the first 22 scaffolds but I would be interested in running it on all scaffolds given I have some very low coverage ancient samples. Would any of you have an utility to remap scaffolds names to smaller integers?
Thank you, Best wishes,
Marie
1) Make your scaffold names integers 1..56 2) Run smartpca with numchrom: 56 Pretty easy!
Nick
On Thu, May 13, 2021 at 2:44 PM mariels @.***> wrote:
Dear Nick and EveTC,
I would also be interested in running smartpca on whole genome resequencing data with a reference genome composed of 56 scaffolds. I can see smartpca only uses the SNPs on the first 22 scaffolds but I would be interested in running it on all scaffolds given I have some very low coverage ancient samples. Would any of you have an utility to remap scaffolds names to smaller integers?
Thank you, Best wishes,
Marie
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DReichLab/AdmixTools/issues/54#issuecomment-840756107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B4WLOSMYFJVR2UUNP3TNQMY5ANCNFSM4IPADZOQ .
Thanks Nick, that was indeed very easy, I did not notice the numchrom option.
Hi there,
I would like to use convertf to convert from vcf to eigenstrat to test hybridization using ADMIXTOOLS. The problem is, that my chromosome IDs don't refer to actual chromosomes but to scaffolds.
When I run converf I get an error saying: warning (mapfile): bad chrom: 100
I realized that when I set all chromosomes to 1, sort the variants and remove those that are duplicate, convertf works. But I dont want to loose the actual snp order. Also I'd prefer not to remove duplicate variants...
Is there any way to circumvent this and to be able to generate an eigenstrat file without having to change the chrom IDs?
Thanks in advance!