Illumina / GTCtoVCF

Script to convert GTC/BPM files to VCF
Apache License 2.0
42 stars 30 forks source link

Now supports extracting gencall scores (GCALL) as real numbers #74

Open Ahhgust opened 1 year ago

Ahhgust commented 1 year ago

I added support for exporting an extra field; the GenCall score (without Phred scaling, so as to have better precision). (And I use git forking mechanics once in a blue moon, so apologies in advance!)

While this is not the place, it would also be handy to export the gentrain score too, but I could not find programmatic access to it.

Thanks! -August

jzieve commented 1 year ago

This is essentially what it used to do. I'm thinking it may be better to just change GS back to what you have here with GCALL as the phred-scaled score may be useful for parsing via bcftools, but is not actually a good metric for GS.

For the gentrain score, that is encoded in the cluster file (egt) and out of the scope of this tool. This tool may help you out: https://github.com/Illumina/BeadArrayFiles/blob/develop/examples/locus_summary.py

Ahhgust commented 1 year ago

Thanks for pointing me to the locus_summary script. I'll definitely check it out. And yes, at least for us we really wanted access to the raw value-the current GQ tag is likely fine for most folks (though some purists might object as it's not quite the same thing as the genotype quality). Regardless, I appreciate your help and for maintaining this tool, and for responding so quickly! -August

From: jzieve @.> Sent: Thursday, April 27, 2023 1:32 PM To: Illumina/GTCtoVCF @.> Cc: Woerner, August @.>; Author @.> Subject: [EXT] Re: [Illumina/GTCtoVCF] Now supports extracting gencall scores (GCALL) as real numbers (PR #74)

This is essentially what it used to do. I'm thinking it may be better to just change GS back to what you have here with GCALL as the phred-scaled score may be useful for parsing via bcftools is not actually a good metric for GS.

For the gentrain score, that is encoded in the cluster file (egt) and out of the scope of this tool. This tool may help you out: https://github.com/Illumina/BeadArrayFiles/blob/develop/examples/locus_summary.py

- Reply to this email directly, view it on GitHubhttps://github.com/Illumina/GTCtoVCF/pull/74#issuecomment-1526144810, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGBXIUI4PRWGUAFOAHGTLELXDK3QNANCNFSM6AAAAAAXOG47JU. You are receiving this because you authored the thread.Message ID: @.**@.>>

jzieve commented 1 year ago

@Ahhgust Quick update: I was advised to point you to Illumina's C#/dotnet core implementation of GTCtoVCF here: https://support.illumina.com/array/array_software/ima-array-analysis-cli.html (i.e. see latest README about semi-archived state). The 2.0 version of that software should reflect the GS score as you prefer (i.e. not phred-scaled). Hopefully that helps meet your needs.

Ahhgust commented 1 year ago

Thanks! FYSA, there may be a bug in the current build. See:

@.***

If you look carefully, you'll see that the SampleID is missing. (This is using version: Array Analysis CLI 2.1.0) Using the following command: array-analysis-cli genotype gtc-to-vcf --csv-manifest /eva/edatums/reference_materials/snp_panel_info/GSA/GRCh38/Manifests/GSA-24v3-0_A2.csv --genome-fasta-file /eva/edatums/reference_materials/reference_genomes/grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --unsquash-duplicates --gtc-folder GTC --bpm-manifest /eva/edatums/reference_materials/snp_panel_info/GSA/GRCh38/Manifests/GSA-24v3-0_A2.bpm --output-folder VCFs/

-August

From: jzieve @.> Sent: Friday, May 5, 2023 1:06 PM To: Illumina/GTCtoVCF @.> Cc: Woerner, August @.>; Mention @.> Subject: [EXT] Re: [Illumina/GTCtoVCF] Now supports extracting gencall scores (GCALL) as real numbers (PR #74)

@Ahhgusthttps://github.com/Ahhgust Quick update: I was advised to point you to Illumina's C#/dotnet core implementation of GTCtoVCF here: https://support.illumina.com/array/array_software/ima-array-analysis-cli.html (i.e. see latest README about semi-archived state). The 2.0 version of that software should reflect the GS score as you prefer (i.e. not phred-scaled). Hopefully that helps meet your needs.

- Reply to this email directly, view it on GitHubhttps://github.com/Illumina/GTCtoVCF/pull/74#issuecomment-1536600267, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGBXIUJUI66VSFOLHPJUAWDXEU6P3ANCNFSM6AAAAAAXOG47JU. You are receiving this because you were mentioned.Message ID: @.**@.>>

jzieve commented 1 year ago

Thanks for bringing to my attention. But sorry not following... did you paste something? Where is the SampleID missing? Also, how were the GTCs generated? Might help track down the bug. I think we've seen something similar from Beeline generated GTCs. But seemed to be ok when the IDAT->GTC conversion was done via Array Analysis CLI.