CCRGeneticsBranch / Oncogenomics_v2

Oncogenomics portal version 2
0 stars 0 forks source link

Migrate AVIA to opencravat (add SpliceAI) #18

Open hsienchao opened 1 year ago

hsienchao commented 1 year ago

Comments from Anney:

After we reannotate all 3 millions khanlab variants, about 0.2% (6443/3235826 ~ 0.2%) did not have OC annotation. It is because these position did not map to a HG38 (OC does the liftover first, when HG19 variants are given).

in oncosnpprod, if you run the following query, you will get all the samples that have these variants:

select * from ( (select original_inputchrom, original_input__pos, original_inputREF_BASE,original_inputalt_BASE from hg19_annot_oc@aviap_lnk where baseso is null) a join var_sample_avia b on a.original_input__chrom = b.CHROMOSOME and a.original_inputpos = b.START_POS and a.original_inputREF_BASE = b.ref and a.original_input__alt_BASE = b.alt );

hsienchao commented 1 year ago
Sample: CL0049_N1D_E2_HGM3YBGXY Case:OM16-008-FFPE Type: Germline Category Both AVIA_Only OC_Only AVIA only Reason OC only Reason
Total 747 49 1 variant is classified synonymous in AVIA
Tier 1.0 1 0 0
Tier 1.1 0 1 0 Clinvar definition change: chr12:65563608-65563631 GCCGCGGGACCAGCGGCGGCGGCG->-. Tier 1.1 -> Tier 1.3
Tier 1.2 0 0 0
Tier 1.3 1 0 1 See Tier 1.1
Tier 2 16 1 4 CHR8:37555933-37555933 -->CG not found in OC CHR19:33444628-33444643 TGTCCTCTTCGTCCCC->- etc : MAF changed
Tier 3 9 3 1 chr7:2265161-2265161 G>A: HGMD missing;chr1:152285077-152285080 clinvar changed chr19:33444607-33444607 T>G: MAF changed
Tier 4 40 5 3
No tier 665 54 6
hsienchao commented 1 year ago

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.MAF.pdf

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.Reported.pdf

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.zip

hsienchao commented 1 year ago

Sample: CL0049_T1D_E2_HGM3YBGXY Case:OM16-008-FFPE Type: Somatic

Category Both AVIA_Only OC_Only
Total 15544 178 34
Tier 1.0 0 0 0
Tier 1.1 4 0 0
Tier 1.2 21 5 3
Tier 1.3 13 2 0
Tier 2 70 2 2
Tier 3 1069 24 2
Tier 4 14141 269 53
No tier 77 13 111
hsienchao commented 1 year ago

CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.MAF.pdf CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.Reported.pdf [CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.zip](https://github.com/CCRGeneticsBranch/Oncogenomics_v2/files/11349190/CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.zip)

hsienchao commented 1 year ago

The gene page is working now

hsienchao commented 1 year ago

Added the annotation description to detailed page

cheanney commented 1 year ago

HGMD in OC is version 2021 and AVIA is version 2022. Need to update HGMD to 2022 in OC

cheanney commented 1 year ago

Found the reason why there are count differences. The version of CBIO in OC is newer than AVIA.

hsienchao commented 1 year ago

There was a configuration error on my end that fails to put ICGC to the select list. I’ve fixed this and this is the new plot:

Image

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

I also found some MAF values on dev are larger than 1:

 

Some examples:

Chromosome | Start | End | Ref | Alt | MAF -- | -- | -- | -- | -- | -- chr1 | 24417419 | 24417419 | T | C | 144 chr11 | 35747665 | 35747665 | A | G | 54 chr12 | 132497581 | 132497581 | G | A | 31 chr7 | 123593764 | 123593764 | T | C | 169 chr22 | 20131213 | 20131213 | G | A | 11 chr7 | 107820819 | 107820819 | T | C | 52 chr1 | 169484767 | 169484767 | A | G | 393

 

Query example:

select maf from hg19_annot_oc@aviad_lnk where chr='1' and query_start=24417419

cheanney commented 1 year ago

When creating the MAF column, included the allele_counts from UK10K like UK10K_COHORTUK10K_TWINS_AC UK10K_COHORT__UK10K_ALSPAC_AC UK10K_COHORTUK10K_AC Will remove these columns for MAF calculation in the MV

cheanney commented 1 year ago

Missing HGMD annotation for variant: chr7:2265161-2265161 Now this variant has HGMD annotation

Screenshot 2023-06-21 at 10 27 30 AM
cheanney commented 1 year ago

VAR_SAMPLE_AVIA_OC_FIXED table in oncosnpdev has HGMD_2022 and CBIO_2021 that matched AVIA's versions MAF issue is fixed as well

hsienchao commented 1 year ago

Tested VAR_SAMPLE_AVIA_OC_FIXED. We found the reported discrepancy was caused by genie column which is truncated. Now I use genie__count instead. The results are consistent with old AVIA now:

Germline:

Image

Somatic:

Image

hsienchao commented 1 year ago

The summary of tier comparison of VAR_SAMPLE_AVIA_OC_FIXED:

Germline:

Category Both AVIA_Only OC_Only
Total 747 49 1
Tier 1.0 1 0 0
Tier 1.1 0 1 1
Tier 1.2 0 0 0
Tier 1.3 1 0 0
Tier 2 16 1 5
Tier 3 10 2 1
Tier 4 40 5 2
No tier 665 54 6

Somatic:

Category Both AVIA_Only OC_Only
Total 15544 178 34
Tier 1.0 0 0 0
Tier 1.1 4 0 0
Tier 1.2 21 5 0
Tier 1.3 13 2 0
Tier 2 70 2 2
Tier 3 1072 21 2
Tier 4 14141 269 53
No tier 77 13 111

Previously HGMD difference is gone. The difference is caused by 1. MAF 2. Clinvar (germline)

Germline:

Image

Somatic:

Image

Tier2 defined in OC only (due to MAF difference):

Chr Start End Ref Alt MAF.OC MAF.AVIA
chr19 33444628 33444643 TGTCCTCTTCGTCCCC - 0.0014 0.375
chr19 33444610 33444611 AT - 0.0022 0.125
chr19 33444611 33444611 T - 0.000266 0.125
chr19 33444613 33444615 ACA - 0.000132 0.125

chr19:33444613-33444615 has no Gnomad3 data.

hsienchao commented 1 year ago

Here are some comments to start with.

  1. What is the meaning of -1 in the frequency? https://watch.screencastify.com/v/a1su4Ixvv8CDr99LEY5F
  2. Open Cravat, it would be good to show what gene you are looking at: https://watch.screencastify.com/v/POTWIavaFZ04VC7Ue0SH
  3. For cells with a lot of data, can you fix the column size?: https://watch.screencastify.com/v/eQo6D7POLceEXSaGtge1
  4. It is not clear what some of the data means: See: https://watch.screencastify.com/v/LbF3ZSD4BiREHFoalYSI (? Low priority?)
  5. Is it possible to directly link to some of the items in the tables e.g. https://watch.screencastify.com/v/Bj1F12zxyS1daitmnFSE (This is a lot of work, so maybe lower priority).

Javed

hsienchao commented 10 months ago

I started the migration on staging server: https://fsabcl-onc01t.ncifcrf.gov/clinomics/

The new variants have been placed on /mnt/projects/CCR-JK-oncogenomics/static/site_data/prod/avia/hg19/new_variants.tsv. Anney is working on processing the new variants. Once the whole workflow is tested, we can start the official migration.

hsienchao commented 10 months ago

Anney has finished the AVIA file exchange cron jobs. Now I've placed new_variants.tsv daily. I will test how it works. If everything looks good, we should able to be ready for the migration.