WGLab / PennCNV

Copy number vaiation detection from SNP arrays
http://penncnv.openbioinformatics.org
Other
88 stars 53 forks source link

empty GC model files #83

Open NTNguyen13 opened 2 years ago

NTNguyen13 commented 2 years ago

Hi, I'm trying to create the gc model file for the GSA v3 microarray in hg38.

I used this command:

cal_gc_snp.pl \
    PennCNV/gc_file/hg38.gc5Base.sorted.txt \
    GSA.snppos.txt \
    --output GSA.gcmodel

The hg38.gc5Base.sorted.txt file is the decompressed file from PennCNV, and the GSA.snppos.txt contains Name, Chr and Position, the header looks like this:

Name    Chr Position
rs9651229   chr1    632287
rs9701872   chr1    632828
rs11497407  chr1    633147
GSA-rs116587930 chr1    792461
rs3131972   chr1    817341
GSA-rs114525117 chr1    823656
rs12127425  chr1    858952
GSA-rs79373928  chr1    866156
GSA-rs116452738 chr1    899450

However the command resulted in empty GC model file:

NOTICE: Finished reading chr and position information for 653817 markers in 26 chromosomes
NOTICE: Finish processing 0 lines in GC file

Could you please advice me on this issue? Thank you

varunorama commented 2 years ago

I ran into this issue before and it seems to be because of the genome type. I used the hg19 genome and it worked - I believe the GSAv3 is based on hg19.

Hope this helps.

kaichop commented 2 years ago

Hi Varun, Thank you.

Hi Nguyen, The problem is that UCSC provides the gc5Base file for hg18 and hg19 (for example, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gc5Base.txt.gz), but they do not actually do it for hg38. The file that you downloaded is probably gc5BaseBw.txt, not gc5Base.txt. The gc5Base.txt contains GC information for every 5kb fragment in human genome. An example is below. So you could build such a file yourself, and then use it in cal_gc_snp.pl.

585 chr1 10000 15120 chr1.0 5 1024 0 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 60360 4024000 585 chr1 15120 20240 chr1.1 5 1024 1024 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 60460 3966800 585 chr1 20240 25360 chr1.2 5 1024 2048 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 54900 3370000 585 chr1 25360 30480 chr1.3 5 1024 3072 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 50080 3089600 585 chr1 30480 35600 chr1.4 5 1024 4096 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 47560 2695200 585 chr1 35600 40720 chr1.5 5 1024 5120 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 50420 2998800 585 chr1 40720 45840 chr1.6 5 1024 6144 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 33600 1520000 585 chr1 45840 50960 chr1.7 5 1024 7168 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 36100 1710800 585 chr1 50960 56080 chr1.8 5 1024 8192 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 38940 1975600 585 chr1 56080 61200 chr1.9 5 1024 9216 /gbdb/hg19/wib/gc5Base.p13.plusMT.wib 0 100 1024 36360 1720800

On Tue, Jul 26, 2022 at 10:33 AM Varun B Dwaraka @.***> wrote:

I ran into this issue before and it seems to be because of the genome type. I used the hg19 genome and it worked - I believe the GSAv3 is based on hg19.

Hope this helps.

— Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/83#issuecomment-1195563654, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OF6Z7DG4FY3UTBR743VV7ZKJANCNFSM5V3FDYWA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lgmgeo commented 9 months ago

Have you checked if your hg38.gc5Base.sorted.txt file is with or without the “chr” prefix? You should have the same prefix in your GSA.snppos.txt signal file