NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

Tools: Compare the similarity between UCSC v38 bed file and UCSC hg19 bed file #33

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago

Original Questions from Kristie

how similar they are in terms of total genomic area covered

Strategy

  1. Prepare merged UCSC bed file for both v38 and hg19
  2. Use UCSC liftover tool to convert the regions in UCSC v38 bed file to hg19
  3. Compare the overlap part between v38 Bed (after liftover) and hg19
  4. Output the statistic result
lxwgcool commented 3 years ago

How to obtain liftover tool

  1. Check the link below for overview first https://genviz.org/module-01-intro/0001/06/02/liftoverTools/

  2. Down load exe file for "liftover" FTP: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ Exe: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver

  3. Down load chain file for "Hg38 to Hg19" FTP: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/ Chain File: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz

  4. How to use it

    ./liftOver ../BedFileForRef38_CCDS.MergedOverlap.Brief.bed ./hg38ToHg19.over.chain.gz v38_to_hg19_liftover.bed unlifted.bed

    Also check: https://genome.sph.umich.edu/wiki/LiftOver

Related path in CCAD cluster

  1. Liftover tools /DCEG/Projects/Exome/annotation/Exome_capture_targets/UCSC/LiftOver

  2. Converted result /DCEG/Projects/Exome/annotation/Exome_capture_targets/UCSC/LiftOver/v38_to_hg19

  3. v38 merged UCSC BED file /DCEG/Projects/Exome/annotation/Exome_capture_targets/UCSC/v38

lxwgcool commented 3 years ago

The code that will be used for overlap comparison

  1. BedDiffCheck.py

Input and output

  1. Input

    • The first argument: 1st BED file: hg 19 example: /home/lixin/lxwg/Test/CGR/UCSC/Test_Hg19/hg19_cds.MergedOverlap.Brief.bed
    • The second argument: 2nd BED file: v38 example: /home/lixin/lxwg/Test/CGR/UCSC/LiftOver/v38_to_hg19_liftover.bed
  2. Output: statistic comparison result example

    
    chrM
    chrUn_gl000228
    ************** Summary *****************
    Total Overlap Base: 32809891
    hg19 Base Num     : 34990773  --> Overlap/Total_hg19: 93.8%
    v38 Base Num      : 33409462  --> Overlap/Total_v38 : 98.2%
    ************** ******* *****************

======== Details ======== chr1 (OverlapBase): 3326575 hg19 Base Num: 3546300 --> Overlap/Base_hg19: 93.8% v38 Base Num : 3673694 --> Overlap/Base_v38 : 90.6%

chr5 (OverlapBase): 1540506 hg19 Base Num: 1613830 --> Overlap/Base_hg19: 95.5% v38 Base Num : 1547297 --> Overlap/Base_v38 : 99.6%

chr2 (OverlapBase): 2421028 hg19 Base Num: 2528098 --> Overlap/Base_hg19: 95.8% v38 Base Num : 2445954 --> Overlap/Base_v38 : 99.0%

chr3 (OverlapBase): 1887076 hg19 Base Num: 1995750 --> Overlap/Base_hg19: 94.6% v38 Base Num : 1892120 --> Overlap/Base_v38 : 99.7%

chr4 (OverlapBase): 1308566 hg19 Base Num: 1381345 --> Overlap/Base_hg19: 94.7% v38 Base Num : 1329502 --> Overlap/Base_v38 : 98.4%

chr6 (OverlapBase): 1678300 hg19 Base Num: 1787120 --> Overlap/Base_hg19: 93.9% v38 Base Num : 1686219 --> Overlap/Base_v38 : 99.5%

chrY (OverlapBase): 88492 hg19 Base Num: 100559 --> Overlap/Base_hg19: 88.0% v38 Base Num : 89049 --> Overlap/Base_v38 : 99.4%

chr7 (OverlapBase): 1525018 hg19 Base Num: 1670824 --> Overlap/Base_hg19: 91.3% v38 Base Num : 1541084 --> Overlap/Base_v38 : 99.0%

chr8 (OverlapBase): 1084671 hg19 Base Num: 1187257 --> Overlap/Base_hg19: 91.4% v38 Base Num : 1098844 --> Overlap/Base_v38 : 98.7%

chr9 (OverlapBase): 1309482 hg19 Base Num: 1430994 --> Overlap/Base_hg19: 91.5% v38 Base Num : 1317985 --> Overlap/Base_v38 : 99.4%

chrX (OverlapBase): 1254477 hg19 Base Num: 1311806 --> Overlap/Base_hg19: 95.6% v38 Base Num : 1282971 --> Overlap/Base_v38 : 97.8%

chr10 (OverlapBase): 1281963 hg19 Base Num: 1388808 --> Overlap/Base_hg19: 92.3% v38 Base Num : 1294953 --> Overlap/Base_v38 : 99.0%

chr11 (OverlapBase): 1901662 hg19 Base Num: 2053557 --> Overlap/Base_hg19: 92.6% v38 Base Num : 1920342 --> Overlap/Base_v38 : 99.0%

chr12 (OverlapBase): 1705991 hg19 Base Num: 1811802 --> Overlap/Base_hg19: 94.2% v38 Base Num : 1716877 --> Overlap/Base_v38 : 99.4%

chr13 (OverlapBase): 599725 hg19 Base Num: 634004 --> Overlap/Base_hg19: 94.6% v38 Base Num : 603014 --> Overlap/Base_v38 : 99.5%

chr14 (OverlapBase): 1036763 hg19 Base Num: 1104591 --> Overlap/Base_hg19: 93.9% v38 Base Num : 1046676 --> Overlap/Base_v38 : 99.1%

chr15 (OverlapBase): 1134628 hg19 Base Num: 1219357 --> Overlap/Base_hg19: 93.1% v38 Base Num : 1154916 --> Overlap/Base_v38 : 98.2%

chr16 (OverlapBase): 1360229 hg19 Base Num: 1487027 --> Overlap/Base_hg19: 91.5% v38 Base Num : 1367578 --> Overlap/Base_v38 : 99.5%

chr17 (OverlapBase): 1884936 hg19 Base Num: 1998242 --> Overlap/Base_hg19: 94.3% v38 Base Num : 1893915 --> Overlap/Base_v38 : 99.5%

chr18 (OverlapBase): 521467 hg19 Base Num: 542245 --> Overlap/Base_hg19: 96.2% v38 Base Num : 525707 --> Overlap/Base_v38 : 99.2%

chr19 (OverlapBase): 2159446 hg19 Base Num: 2288932 --> Overlap/Base_hg19: 94.3% v38 Base Num : 2172324 --> Overlap/Base_v38 : 99.4%

chr20 (OverlapBase): 790971 hg19 Base Num: 835169 --> Overlap/Base_hg19: 94.7% v38 Base Num : 793673 --> Overlap/Base_v38 : 99.7%

chr21 (OverlapBase): 328801 hg19 Base Num: 339359 --> Overlap/Base_hg19: 96.9% v38 Base Num : 332056 --> Overlap/Base_v38 : 99.0%

chr22 (OverlapBase): 679118 hg19 Base Num: 733272 --> Overlap/Base_hg19: 92.6% v38 Base Num : 681437 --> Overlap/Base_v38 : 99.7%

chrM (OverlapBase): 0 hg19 Base Num: 525 --> Overlap/Base_hg19: 0.0% v38 Base Num : 0 --> Overlap/Base_v38 : 0

chrUn_gl000228 (OverlapBase): 0 hg19 Base Num: 0 --> Overlap/Base_hg19: 0 v38 Base Num : 1275 --> Overlap/Base_v38 : 0.0%

Good to know! 26 Everything is all set!