CCRGeneticsBranch / Oncogenomics_v2

Oncogenomics portal version 2
0 stars 0 forks source link

Migrate AVIA to opencravat (add SpliceAI) #18

Open hsienchao opened 1 year ago

hsienchao commented 1 year ago

Comments from Anney:

After we reannotate all 3 millions khanlab variants, about 0.2% (6443/3235826 ~ 0.2%) did not have OC annotation. It is because these position did not map to a HG38 (OC does the liftover first, when HG19 variants are given).

in oncosnpprod, if you run the following query, you will get all the samples that have these variants:

select * from ( (select original_inputchrom, original_input__pos, original_inputREF_BASE,original_inputalt_BASE from hg19_annot_oc@aviap_lnk where baseso is null) a join var_sample_avia b on a.original_input__chrom = b.CHROMOSOME and a.original_inputpos = b.START_POS and a.original_inputREF_BASE = b.ref and a.original_input__alt_BASE = b.alt );

hsienchao commented 1 year ago
Sample: CL0049_N1D_E2_HGM3YBGXY Case:OM16-008-FFPE Type: Germline Category Both AVIA_Only OC_Only AVIA only Reason OC only Reason
Total 747 49 1 variant is classified synonymous in AVIA
Tier 1.0 1 0 0
Tier 1.1 0 1 0 Clinvar definition change: chr12:65563608-65563631 GCCGCGGGACCAGCGGCGGCGGCG->-. Tier 1.1 -> Tier 1.3
Tier 1.2 0 0 0
Tier 1.3 1 0 1 See Tier 1.1
Tier 2 16 1 4 CHR8:37555933-37555933 -->CG not found in OC CHR19:33444628-33444643 TGTCCTCTTCGTCCCC->- etc : MAF changed
Tier 3 9 3 1 chr7:2265161-2265161 G>A: HGMD missing;chr1:152285077-152285080 clinvar changed chr19:33444607-33444607 T>G: MAF changed
Tier 4 40 5 3
No tier 665 54 6
hsienchao commented 1 year ago

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.MAF.pdf

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.Reported.pdf

CL0049_N1D_E2_HGM3YBGXY.OM16-008-FFPE.germline.zip

hsienchao commented 1 year ago

Sample: CL0049_T1D_E2_HGM3YBGXY Case:OM16-008-FFPE Type: Somatic

Category Both AVIA_Only OC_Only
Total 15544 178 34
Tier 1.0 0 0 0
Tier 1.1 4 0 0
Tier 1.2 21 5 3
Tier 1.3 13 2 0
Tier 2 70 2 2
Tier 3 1069 24 2
Tier 4 14141 269 53
No tier 77 13 111
hsienchao commented 1 year ago

CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.MAF.pdf CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.Reported.pdf [CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.zip](https://github.com/CCRGeneticsBranch/Oncogenomics_v2/files/11349190/CL0049_T1D_E2_HGM3YBGXY.OM16-008-FFPE.somatic.zip)

hsienchao commented 1 year ago

The gene page is working now

hsienchao commented 1 year ago

Added the annotation description to detailed page

cheanney commented 1 year ago

HGMD in OC is version 2021 and AVIA is version 2022. Need to update HGMD to 2022 in OC

cheanney commented 1 year ago

Found the reason why there are count differences. The version of CBIO in OC is newer than AVIA.

hsienchao commented 1 year ago

There was a configuration error on my end that fails to put ICGC to the select list. I’ve fixed this and this is the new plot:

Image

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

I also found some MAF values on dev are larger than 1:

 

Some examples:

Chromosome | Start | End | Ref | Alt | MAF -- | -- | -- | -- | -- | -- chr1 | 24417419 | 24417419 | T | C | 144 chr11 | 35747665 | 35747665 | A | G | 54 chr12 | 132497581 | 132497581 | G | A | 31 chr7 | 123593764 | 123593764 | T | C | 169 chr22 | 20131213 | 20131213 | G | A | 11 chr7 | 107820819 | 107820819 | T | C | 52 chr1 | 169484767 | 169484767 | A | G | 393

 

Query example:

select maf from hg19_annot_oc@aviad_lnk where chr='1' and query_start=24417419

cheanney commented 1 year ago

When creating the MAF column, included the allele_counts from UK10K like UK10K_COHORTUK10K_TWINS_AC UK10K_COHORT__UK10K_ALSPAC_AC UK10K_COHORTUK10K_AC Will remove these columns for MAF calculation in the MV

cheanney commented 1 year ago

Missing HGMD annotation for variant: chr7:2265161-2265161 Now this variant has HGMD annotation

Screenshot 2023-06-21 at 10 27 30 AM
cheanney commented 1 year ago

VAR_SAMPLE_AVIA_OC_FIXED table in oncosnpdev has HGMD_2022 and CBIO_2021 that matched AVIA's versions MAF issue is fixed as well

hsienchao commented 1 year ago

Tested VAR_SAMPLE_AVIA_OC_FIXED. We found the reported discrepancy was caused by genie column which is truncated. Now I use genie__count instead. The results are consistent with old AVIA now:

Germline:

Image

Somatic:

Image

hsienchao commented 1 year ago

The summary of tier comparison of VAR_SAMPLE_AVIA_OC_FIXED:

Germline:

Category Both AVIA_Only OC_Only
Total 747 49 1
Tier 1.0 1 0 0
Tier 1.1 0 1 1
Tier 1.2 0 0 0
Tier 1.3 1 0 0
Tier 2 16 1 5
Tier 3 10 2 1
Tier 4 40 5 2
No tier 665 54 6

Somatic:

Category Both AVIA_Only OC_Only
Total 15544 178 34
Tier 1.0 0 0 0
Tier 1.1 4 0 0
Tier 1.2 21 5 0
Tier 1.3 13 2 0
Tier 2 70 2 2
Tier 3 1072 21 2
Tier 4 14141 269 53
No tier 77 13 111

Previously HGMD difference is gone. The difference is caused by 1. MAF 2. Clinvar (germline)

Germline:

Image

Somatic:

Image

Tier2 defined in OC only (due to MAF difference):

Chr Start End Ref Alt MAF.OC MAF.AVIA
chr19 33444628 33444643 TGTCCTCTTCGTCCCC - 0.0014 0.375
chr19 33444610 33444611 AT - 0.0022 0.125
chr19 33444611 33444611 T - 0.000266 0.125
chr19 33444613 33444615 ACA - 0.000132 0.125

chr19:33444613-33444615 has no Gnomad3 data.

hsienchao commented 11 months ago

Here are some comments to start with.

  1. What is the meaning of -1 in the frequency? https://watch.screencastify.com/v/a1su4Ixvv8CDr99LEY5F
  2. Open Cravat, it would be good to show what gene you are looking at: https://watch.screencastify.com/v/POTWIavaFZ04VC7Ue0SH
  3. For cells with a lot of data, can you fix the column size?: https://watch.screencastify.com/v/eQo6D7POLceEXSaGtge1
  4. It is not clear what some of the data means: See: https://watch.screencastify.com/v/LbF3ZSD4BiREHFoalYSI (? Low priority?)
  5. Is it possible to directly link to some of the items in the tables e.g. https://watch.screencastify.com/v/Bj1F12zxyS1daitmnFSE (This is a lot of work, so maybe lower priority).

Javed

hsienchao commented 9 months ago

I started the migration on staging server: https://fsabcl-onc01t.ncifcrf.gov/clinomics/

The new variants have been placed on /mnt/projects/CCR-JK-oncogenomics/static/site_data/prod/avia/hg19/new_variants.tsv. Anney is working on processing the new variants. Once the whole workflow is tested, we can start the official migration.

hsienchao commented 8 months ago

Anney has finished the AVIA file exchange cron jobs. Now I've placed new_variants.tsv daily. I will test how it works. If everything looks good, we should able to be ready for the migration.