apriha / snps

tools for reading, writing, merging, and remapping SNPs
BSD 3-Clause "New" or "Revised" License
98 stars 19 forks source link

VCF / GVCF parsing issue #67

Open deniseho-98 opened 4 years ago

deniseho-98 commented 4 years ago

Hi. Am trying out the codes. Understand that the raw files here mean genotype files from DTC companies. However, what I have is only vcf and gvcf files from in-house sequencing platform. May I know can these be used? How? I have zero knowledge in coding. Thanks a lot!

deniseho-98 commented 4 years ago

Computed cMs in shared_DNA_one_chrom.excel. Total cMs equals to 3600, in which DNA painter shows parent/child relationship. But the two persons are totally unrelated. Anyone could help?

apriha commented 4 years ago

Hi, thanks for the note. Yes, VCF and GVCF files should work if the SNPs are annotated with RSID. The files can be loaded like shown in the examples.

As for the issue with total shared cMs = 3600, was this for output generated by lineage?

deniseho-98 commented 4 years ago

Dear Apriha/Lineage,

Thank you so much for replying.

The output by lineage is an excel, one of the columns is the cMs (please see attached). The total shared cMs is calculated by adding all the values, no? That's how I obtained ~3600cMs. When I entered this value on DNApainter, it showed parent/child relationship, when the two individuals are actually husband and wife.

Can you please help?

Thank you.

Sincerely, Chai San

On Tue, Feb 25, 2020 at 1:01 PM Andrew Riha notifications@github.com wrote:

Hi, thanks for the note. Yes, VCF and GVCF files should work if the SNPs are annotated with RSID. The files can be loaded like shown in the examples https://lineage.readthedocs.io/en/latest/readme.html#load-raw-data.

As for the issue with total shared cMs = 3600, was this for output generated by lineage?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apriha/lineage/issues/75?email_source=notifications&email_token=AOTD7RQO3DPWPYOY376BGQ3RESQZ7A5CNFSM4KXUPKQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2SDVY#issuecomment-590684631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTD7RSQRGEUVIV2YZDOZRTRESQZ7ANCNFSM4KXUPKQA .

apriha commented 4 years ago

Hi Chai, sorry, the file didn't come through. But yes, you're correct that the total shared cMs is calculated by adding all values in the cMs column of the shared_dna output files.

But I suspect that lineage is finding matches due to the way that the genotype is parsed by snps in the VCF / GVCF... You said that this was an in-house sequencing platform - is that also creating the VCF / GVCF? Thanks again.

willgdjones commented 4 years ago

The package accepts VCF files, but has not been tested on gVCF files! You will most likely run out of RAM if you try and load a gVCF.

On 29 March 2020 at 19:49:25, deniseho-98 (notifications@github.com) wrote:

Hi. Am trying out the codes. Understand that the raw files here mean genotype files from DTC companies. However, what I have is only vcf and gvcf files from in-house sequencing platform. May I know can these be used? How? I have zero knowledge in coding. Thanks a lot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/apriha/snps/issues/67, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAND4KFSS3CKNXLACK36U5TRJ6J3LANCNFSM4LWCOX3Q .