Open mcetincompbio opened 6 years ago
Do you have readgroups (RGs) added to each of your bam files? You can do this using the PICARD AddOrReplaceReadGroups function
https://broadinstitute.github.io/picard/command-line-overview.html
Thanks!
Now I am getting a different error.
Traceback (most recent call last):
File "SNPbam2vcf.py", line 246, in
Is there an index file for each corresponding bam file present?
Yes. The index files are present in the same folder as the bams. bam.list contains the paths of the bams only.
Any idea what might be happening?
It’s hard to parse from the error message alone.
Could you send us one of your bam files to play with (or a portion of them that includes the header and a subset of reads)?
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
On Oct 10, 2018, at 2:34 AM, mcetincompbio notifications@github.com wrote:
Any idea what might be happening?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Here is one of the bams: ftp://ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172924/bam/NA12877_S1.bam
And another, if you need two: ftp://ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172924/bam/NA12890_S1.bam
Do you need the SNP list file as well?
Yes, please send me the snp list.
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
On Oct 10, 2018, at 9:39 AM, mcetincompbio notifications@github.com wrote:
Here is one of the bams: ftp://ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172924/bam/NA12877_S1.bam
And another, if you need two: ftp://ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172924/bam/NA12890_S1.bam
Do you need the SNP list file as well?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Two list files are here:
https://www.justbeamit.com/y9x72
Let me know if there is a problem.
Thanks!!
Apparently this download link "no longer exists".
On Fri, Oct 12, 2018 at 4:32 AM mcetincompbio notifications@github.com wrote:
Two list files are here:
https://www.justbeamit.com/y9x72
Let me know if there is a problem.
Thanks!!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/maximum-likelihood-relatedness-estimation/issues/15#issuecomment-429248542, or mute the thread https://github.com/notifications/unsubscribe-auth/AM8_k74uz2QIJLkcde7Ghg2RDvY-JJaMks5ukFOfgaJpZM4WvtuR .
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
Can you send me your file with the list of bams? Do you have an extra carriage return, the script seems to be looking for a file '' that is not there.
That was it!
Thanks a lot. I feel dumb but these things happen...
Now I run the program with the published CEPH family data (whose relatedness is known), and get the following result:
Ind1 Ind2 k0_hat k1_hat k2_hat pi_HAT nbSNP NA12883 NA12890 1.000 0.000 0.000 0.000 1495 NA12877 NA12889 0.999 0.000 0.001 0.001 1496 NA12877 NA12885 0.999 0.001 0.000 0.000 1497 NA12885 NA12889 0.999 0.001 0.000 0.000 1496 NA12883 NA12889 0.999 0.001 0.000 0.000 1493 NA12885 NA12890 1.000 0.000 0.000 0.000 1497 NA12889 NA12890 1.000 0.000 0.000 0.000 1495 NA12877 NA12883 0.999 0.001 0.000 0.000 1494 NA12883 NA12885 0.881 0.002 0.118 0.119 1496 NA12877 NA12890 0.989 0.010 0.000 0.006 1497
NA12877 and NA12889 are first degree relatives. But we get a k0_hat of 0.999 which, if I'm not mistaken, means that no relatedness is found.
What could be the reason for this?
I am downloading the bams from the links I gave earlier.
How were your SNPs chosen?
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
On Oct 16, 2018, at 9:52 AM, mcetincompbio notifications@github.com wrote:
NA12877 and NA12889
I took all snps in the 1000 genomes dataset and thinned it so that the snps are 100kb apart.
Was there a criteria to the thinning (MAF in CEU?). If the SNPs are not variable or vary rare in your pedigree then there will be no signal.
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
On Oct 16, 2018, at 10:09 AM, mcetincompbio notifications@github.com wrote:
I took all snps in the 1000 genomes dataset and thinned it so that the snps are 100kb apart.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Yes. You should preferably choose SNPs where the AF is >10% in CEU individuals.
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
On Oct 17, 2018, at 3:57 AM, mcetincompbio notifications@github.com wrote:
It seems like a great majority (possibly more than 95%) of the SNPS are 0/0 in all 5 individuals of the CEPH family.
Maybe that's why.
Should I try with a different set of snps?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Really appreciate your help!
I did that, and got the following result:
Ind1 Ind2 k0_hat k1_hat k2_hat pi_HAT nbSNP NA12885 NA12889 1.000 0.000 0.000 0.000 17665 NA12883 NA12890 1.000 0.000 0.000 0.000 17655 NA12889 NA12890 1.000 0.000 0.000 0.000 17656 NA12885 NA12890 1.000 0.000 0.000 0.000 17653 NA12877 NA12890 0.998 0.001 0.001 0.001 17661 NA12883 NA12889 0.999 0.001 0.000 0.000 17663 NA12877 NA12885 0.999 0.001 0.000 0.000 17667 NA12877 NA12883 0.999 0.001 0.000 0.000 17664 NA12883 NA12885 0.854 0.002 0.144 0.145 17660 NA12877 NA12889 0.998 0.002 0.000 0.001 17668
Do you only have members of the CEU pedigree in this dataset, or are there unrelated samples as well. Perhaps you could send me your final vcf so I could look at it? We have analyzed this pedigree before and accurately determined their relatedness (in our preprint).
On Fri, Oct 19, 2018 at 8:27 AM mcetincompbio notifications@github.com wrote:
Really appreciate your help!
I did that, and got the following result:
Ind1 Ind2 k0_hat k1_hat k2_hat pi_HAT nbSNP NA12885 NA12889 1.000 0.000 0.000 0.000 17665 NA12883 NA12890 1.000 0.000 0.000 0.000 17655 NA12889 NA12890 1.000 0.000 0.000 0.000 17656 NA12885 NA12890 1.000 0.000 0.000 0.000 17653 NA12877 NA12890 0.998 0.001 0.001 0.001 17661 NA12883 NA12889 0.999 0.001 0.000 0.000 17663 NA12877 NA12885 0.999 0.001 0.000 0.000 17667 NA12877 NA12883 0.999 0.001 0.000 0.000 17664 NA12883 NA12885 0.854 0.002 0.144 0.145 17660 NA12877 NA12889 0.998 0.002 0.000 0.001 17668
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/maximum-likelihood-relatedness-estimation/issues/15#issuecomment-431346067, or mute the thread https://github.com/notifications/unsubscribe-auth/AM8_k6nYpOl6z29Qux0q9uULU5gF3es0ks5umcUegaJpZM4WvtuR .
-- Krishna R Veeramah, Ph.D. Assistant Professor Dept of Ecology and Evolution Stony Brook University Rm 616, 650 Life Sciences Building Stony Brook NY 11794-5245 Office: 631-632-1101 Cell: 510-207-1424 E-mail: krishna.veeramah@stonybrook.edu http://life.bio.sunysb.edu/ee/veeramahlab
I did read your preprint, and that makes me think I am doing something wrong.
Here is the final vcf. It only has the 5 members of the CEPH pedigree. http://www.rapidshare.com.cn/J8cKZph
Many thanks!
Hi,
Were you able to look at the file?
I am running the following command: python SNPbam2vcf.py ancAnatBam.list BonBarTep.vcf 1000G_D100KApart.list
ancAnatBam.list contains the paths of a list of Bam files. 1000G_D100KApart.list is the snp list in the format requested.
I get this error:
Traceback (most recent call last): File "SNPbam2vcf.py", line 247, in
sample_name.append(samfiles[g].header['RG'][0]['SM'])
KeyError: 'RG'
Please respond.