WGLab / PennCNV

Copy number vaiation detection from SNP arrays
http://penncnv.openbioinformatics.org
Other
88 stars 53 forks source link

hmm file for chicken custom designed array #13

Closed hongenxu closed 6 years ago

hongenxu commented 7 years ago

Hi Dr. Wang,

I want to use PennCNV to call CNV from a custom designed chicken Axiom array (96 CEL files). I followed the supplementary file from the study of "Cognitive Performance Among Carriers of Pathogenic Copy Number Variants: Analysis of 152,000 UK Biobank Subjects". My problem is that in the CNV calling step, I do not HMM file. Would you please give me suggestion to create a HMM file?

Best, Hongen

kaichop commented 7 years ago

You can just use the affygw6.hmm file.

On Thu, Mar 9, 2017 at 6:03 AM, Hongen XU notifications@github.com wrote:

Hi Dr. Wang,

I want to use PennCNV to call CNV from a custom designed chicken Axiom array (96 CEL files). I followed the supplementary file from the study of "Cognitive Performance Among Carriers of Pathogenic Copy Number Variants: Analysis of 152,000 UK Biobank Subjects". My problem is that in the CNV calling step, I do not HMM file. Would you please give me suggestion to create a HMM file?

Best, Hongen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuCN5HfLZ9pUK6ULiBqZqHd7Ewru2ks5rj9yDgaJpZM4MX8AL .

hongenxu commented 7 years ago

Hi Dr. Wang,

thanks for your help. I ran "detect_cnv.pl " with following parameters, perl ~/PennCNV/detect_cnv.pl -test -hmm ../gw6/lib/affygw6.hmm -pfb mdv.pfb -list germlinelistfile -log penncnv_detect_cnv.log -out mdv.rawcnv

The error messages are as follows,

_NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file penncnv_detect_cnv.log NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from mdv.pfb ... Done with 5572 records (1213 records in chr Z,UNK were discarded) NOTICE: Reading LRR and BAF values for from MDV.B05_WP-Cheng_101-child-3_P004_B5 ... Done with 5572 records in 28 chromosomes (1213 records are discarded due to lack of PFB information for the markers) NOTICE: Data from chromosome 23,24,25,26,27,28 will not be used in analysis NOTICE: Median-adjusting LRR values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by -0.0084 NOTICE: Median-adjusting BAF values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by 0.0012 NOTICE: quality summary for MDV.B05_WP-Cheng_101-child-3_P004_B5: LRR_mean=-0.0027 LRR_median=0.0000 LRR_SD=0.1709 BAF_mean=0.4966 BAF_median=0.5000 BAF_SD=0.0914 BAF_DRIFT=0.006638 WF=-0.0183 GCWF=-0.0085 WARNING: Sample from MDV.B05_WP-Cheng_101-child-3_P004B5 does not pass default quality control criteria due to its drifting BAF values (drift=0.00663839993147458)! WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution! Segmentation fault (core dumped)

Chicken have WZ sex chromosomes, it seems that PennCNV cannot recognize sex chromosomes. I have a lot of samples failed in the quality control due to drifting BAF values. How can I solve this problem, should I adjust values in hmm file ?

Best, Hongen

kaichop commented 7 years ago

it is a compilation error. You need to recompile source code. it is just a warning message that you should ignore. -Kai

On Sat, Mar 11, 2017 at 8:46 AM, Hongen XU notifications@github.com wrote:

Hi Dr. Wang,

thanks for your help. I ran "detect_cnv.pl " with following parameters, perl ~/PennCNV/detect_cnv.pl -test -hmm ../gw6/lib/affygw6.hmm -pfb mdv.pfb -list germlinelistfile -log penncnv_detect_cnv.log -out mdv.rawcnv

The error messages are as follows,

NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file penncnv_detect_cnv.log NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from mdv.pfb ... Done with 5572 records (1213 records in chr Z,UNK were discarded) NOTICE: Reading LRR and BAF values for from MDV.B05_WP-Cheng_101-child-3_P004_B5 ... Done with 5572 records in 28 chromosomes (1213 records are discarded due to lack of PFB information for the markers) NOTICE: Data from chromosome 23,24,25,26,27,28 will not be used in analysis NOTICE: Median-adjusting LRR values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by -0.0084 NOTICE: Median-adjusting BAF values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by 0.0012 NOTICE: quality summary for MDV.B05_WP-Cheng_101-child-3_P004_B5: LRR_mean=-0.0027 LRR_median=0.0000 LRR_SD=0.1709 BAF_mean=0.4966 BAF_median=0.5000 BAF_SD=0.0914 BAF_DRIFT=0.006638 WF=-0.0183 GCWF=-0.0085 WARNING: Sample from MDV.B05_WP-Cheng_101-child-3_P004_B5 does not pass default quality control criteria due to its drifting BAF values (drift=0.00663839993147458)! WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution! Segmentation fault (core dumped)

Chicken have WZ sex chromosomes, it seems that PennCNV cannot recognize sex chromosomes. I have a lot of samples failed in the quality control due to drifting BAF values. How can I solve this problem, should I adjust values in hmm file ?

Best, Hongen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-285867630, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuJupNnhudILsL3orZc33wissQsdzks5rkqWggaJpZM4MX8AL .

hongenxu commented 7 years ago

Hi Dr. Wang,

sorry to trouble you again. I recompile the source code, the "core dumped" error remains.

I ran detect_cnv.pl " with following parameters, perl ~/PennCNV/detect_cnv.pl -test -hmm ../gw6/lib/affygw6.hmm -pfb mdv.pfb -list germlinelistfile -log penncnv_detect_cnv.log -out mdv.rawcnv

the output file mdv.rawcnv is empty and log messages are

NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file penncnv_detect_cnv.log NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from mdv.pfb ... Done with 5971 records (814 records in chr UNK were discarded) NOTICE: Reading LRR and BAF values for from MDV.B05_WP-Cheng_101-child-3_P004_B5 ... Done with 5971 records in 29 chromosomes (814 records are discarded due to lack of PFB information for the markers) NOTICE: Data from chromosome 23,24,25,26,27,28,X will not be used in analysis NOTICE: Median-adjusting LRR values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by -0.0084 NOTICE: Median-adjusting BAF values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by 0.0012 NOTICE: quality summary for MDV.B05_WP-Cheng_101-child-3_P004_B5: LRR_mean=-0.0027 LRR_median=0.0000 LRR_SD=0.1709 BAF_mean=0.4966 BAF_median=0.5000 BAF_SD=0.0914 BAF_DRIFT=0.006638 WF=-0.0183 GCWF=-0.0045 WARNING: Sample from MDV.B05_WP-Cheng_101-child-3_P004_B5 does not pass default quality control criteria due to its drifting BAF values (drift=0.00663839993147458)! WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution! Segmentation fault (core dumped)

I changed the "Z" chromosome in chicken to "X", now detect_cnv.pl can read probes on chromsome "X", but data from chromosome 23,24,25,26,27,28,X will not be used in analysis. Do probes on these chromosomes failed in sample quality control or something else?

Thank you, Hongen

5407503938 commented 7 years ago

@kaichop Hi,Dr Wang, I was facing the same problem. it is a compilation error, need to recompile source code. I try to find solution, but failed. So how should I recompile the source code to deal with this kind of error. Thanks.

kaichop commented 7 years ago

If you want to analyze chrX CNV, you need to add -chrx argument. For chicken, you should also add -lastchr argument to indicate which one is the last chromosome (for human, default is 22)

For the compilation problem, please refer to the website. If compilation does not work, you need to just install a new perl yourself (such as 5.10 or 5.8), and then use your own perl to compile. The latest version (5.14 or higher) tends to have some problems in compilation for some users.

On Sat, Mar 18, 2017 at 7:33 AM, Hongen XU notifications@github.com wrote:

Hi Dr. Wang,

sorry to trouble you again. I recompile the source code, the "core dumped" error remains.

I ran detect_cnv.pl " with following parameters, perl ~/PennCNV/detect_cnv.pl -test -hmm ../gw6/lib/affygw6.hmm -pfb mdv.pfb -list germlinelistfile -log penncnv_detect_cnv.log -out mdv.rawcnv

the output file mdv.rawcnv is empty and log messages are

NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file penncnv_detect_cnv.log NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from mdv.pfb ... Done with 5971 records (814 records in chr UNK were discarded) NOTICE: Reading LRR and BAF values for from MDV.B05_WP-Cheng_101-child-3_P004_B5 ... Done with 5971 records in 29 chromosomes (814 records are discarded due to lack of PFB information for the markers) NOTICE: Data from chromosome 23,24,25,26,27,28,X will not be used in analysis NOTICE: Median-adjusting LRR values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by -0.0084 NOTICE: Median-adjusting BAF values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by 0.0012 NOTICE: quality summary for MDV.B05_WP-Cheng_101-child-3_P004_B5: LRR_mean=-0.0027 LRR_median=0.0000 LRR_SD=0.1709 BAF_mean=0.4966 BAF_median=0.5000 BAF_SD=0.0914 BAF_DRIFT=0.006638 WF=-0.0183 GCWF=-0.0045 WARNING: Sample from MDV.B05_WP-Cheng_101-child-3_P004_B5 does not pass default quality control criteria due to its drifting BAF values (drift=0.00663839993147458)! WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution! Segmentation fault (core dumped)

I changed the "Z" chromosome in chicken to "X", now detect_cnv.pl can read probes on chromsome "X", but data from chromosome 23,24,25,26,27,28,X will not be used in analysis. Do probes on these chromosomes failed in sample quality control or something else?

Thank you, Hongen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-287538254, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuM4j1Aa91nMLwqJmpiMd6TmNmpJDks5rm8D9gaJpZM4MX8AL .

kaichop commented 7 years ago

For the compilation problem, please refer to the website. If compilation does not work, you need to just install a new perl yourself (such as 5.10 or 5.8), and then use your own perl to compile. The latest version (5.14 or higher) tends to have some problems in compilation for some users.

On Sun, Mar 19, 2017 at 2:06 AM, 5407503938 notifications@github.com wrote:

@kaichop https://github.com/kaichop Hi,Dr Wang, I was facing the same problem. it is a compilation error, need to recompile source code. I try to find solution, but failed. So how should I recompile the source code to deal with this kind of error. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-287596828, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuEn8qb1Lz7hK_tXikb3MVfQidczSks5rnMXhgaJpZM4MX8AL .

kaichop commented 7 years ago

In addition, it was reported that compilation fails on GCC5 for some users. You must use GCC4 for the compilation in that case.

On Sun, Mar 19, 2017 at 10:11 AM, Kai Wang kaichop@gmail.com wrote:

If you want to analyze chrX CNV, you need to add -chrx argument. For chicken, you should also add -lastchr argument to indicate which one is the last chromosome (for human, default is 22)

For the compilation problem, please refer to the website. If compilation does not work, you need to just install a new perl yourself (such as 5.10 or 5.8), and then use your own perl to compile. The latest version (5.14 or higher) tends to have some problems in compilation for some users.

On Sat, Mar 18, 2017 at 7:33 AM, Hongen XU notifications@github.com wrote:

Hi Dr. Wang,

sorry to trouble you again. I recompile the source code, the "core dumped" error remains.

I ran detect_cnv.pl " with following parameters, perl ~/PennCNV/detect_cnv.pl -test -hmm ../gw6/lib/affygw6.hmm -pfb mdv.pfb -list germlinelistfile -log penncnv_detect_cnv.log -out mdv.rawcnv

the output file mdv.rawcnv is empty and log messages are

NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file penncnv_detect_cnv.log NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from mdv.pfb ... Done with 5971 records (814 records in chr UNK were discarded) NOTICE: Reading LRR and BAF values for from MDV.B05_WP-Cheng_101-child-3_P004_B5 ... Done with 5971 records in 29 chromosomes (814 records are discarded due to lack of PFB information for the markers) NOTICE: Data from chromosome 23,24,25,26,27,28,X will not be used in analysis NOTICE: Median-adjusting LRR values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by -0.0084 NOTICE: Median-adjusting BAF values for all autosome markers from MDV.B05_WP-Cheng_101-child-3_P004_B5 by 0.0012 NOTICE: quality summary for MDV.B05_WP-Cheng_101-child-3_P004_B5: LRR_mean=-0.0027 LRR_median=0.0000 LRR_SD=0.1709 BAF_mean=0.4966 BAF_median=0.5000 BAF_SD=0.0914 BAF_DRIFT=0.006638 WF=-0.0183 GCWF=-0.0045 WARNING: Sample from MDV.B05_WP-Cheng_101-child-3_P004_B5 does not pass default quality control criteria due to its drifting BAF values (drift=0.00663839993147458)! WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution! Segmentation fault (core dumped)

I changed the "Z" chromosome in chicken to "X", now detect_cnv.pl can read probes on chromsome "X", but data from chromosome 23,24,25,26,27,28,X will not be used in analysis. Do probes on these chromosomes failed in sample quality control or something else?

Thank you, Hongen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-287538254, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuM4j1Aa91nMLwqJmpiMd6TmNmpJDks5rm8D9gaJpZM4MX8AL .

hongenxu commented 7 years ago

Hi Dr. Wang,

thanks for your help. The system-wide perl version is 5.18.4. I "make" in the kext directory, no errors occurred, but when I ran detect_cnv.pl, Segmentation fault (core dumped) occurred.

I also installed a local version of PERL (5.8,5.10,5.12), In the compile step, all versions have errors:

gcc `perl -MExtUtils::Embed -e ccopts` -fPIC   -c -o khmm_wrap.o khmm_wrap.c
gcc `perl -MExtUtils::Embed -e ccopts` -fPIC   -c -o khmm.o khmm.c
gcc `perl -MExtUtils::Embed -e ccopts` -fPIC   -c -o kc.o kc.c
gcc `perl -MExtUtils::Embed -e ccopts` -fPIC   -c -o khmmDev.o khmmDev.c
gcc -shared -o khmm.so khmm_wrap.o khmm.o kc.o khmmDev.o `perl -MExtUtils::Embed -e ldopts` 
/usr/bin/ld: /home/users/xu/perl5/perlbrew/perls/perl-5.8.9/lib/5.8.9/x86_64-linux/CORE/libperl.a(gv.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
/home/users/xu/perl5/perlbrew/perls/perl-5.8.9/lib/5.8.9/x86_64-linux/CORE/libperl.a: could not read symbols: Bad value
collect2: error: ld returned 1 exit status
make: *** [khmm.so] Error 1
kaichop commented 7 years ago

did you try "make clean" first?

Also make sure to use gcc4, since several people reported that gcc5 has issues with recompilation.

On Sun, Mar 19, 2017 at 4:39 PM, Hongen XU notifications@github.com wrote:

Hi Dr. Wang,

thanks for your help. The system-wide perl version is 5.18.4. I "make" in the kext directory, no errors occurred, but when I ran detect_cnv.pl, Segmentation fault (core dumped) occurred.

I also installed a local version of PERL (5.8,5.10,5.12), In the compile step, all versions have errors:

gcc perl -MExtUtils::Embed -e ccopts -fPIC -c -o khmm_wrap.o khmm_wrap.c gcc perl -MExtUtils::Embed -e ccopts -fPIC -c -o khmm.o khmm.c gcc perl -MExtUtils::Embed -e ccopts -fPIC -c -o kc.o kc.c gcc perl -MExtUtils::Embed -e ccopts -fPIC -c -o khmmDev.o khmmDev.c gcc -shared -o khmm.so khmm_wrap.o khmm.o kc.o khmmDev.o perl -MExtUtils::Embed -e ldopts /usr/bin/ld: /home/users/xu/perl5/perlbrew/perls/perl-5.8.9/lib/5.8.9/x86_64-linux/CORE/libperl.a(gv.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC /home/users/xu/perl5/perlbrew/perls/perl-5.8.9/lib/5.8.9/x86_64-linux/CORE/libperl.a: could not read symbols: Bad value collect2: error: ld returned 1 exit status make: *** [khmm.so] Error 1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-287645720, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuOKpfKwAWYQ0rU3N5OqC0yCb8O_3ks5rnZKJgaJpZM4MX8AL .

hongenxu commented 7 years ago

yes, I did "make clean" before "make". My gcc version is gcc version 4.8.3. I will try to install PennCNV on another computer.

Thanks for your help.

kaichop commented 7 years ago

You can try run it on a much smaller data set, and if it works, then it is probably lack of memory in your system.

On Sun, Mar 19, 2017 at 6:47 PM, Hongen XU notifications@github.com wrote:

yes, I did "make clean" before "make". My gcc version is gcc version 4.8.3. I will try to install PennCNV on another computer.

Thanks for your help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-287654493, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuLkPsx3glZybIQ6dLbtF2vPN9lw8ks5rnbBogaJpZM4MX8AL .

ghost commented 7 years ago

Hi Dr Wang, I'm now working on the array data using illumina omniexpressome platform. May I know if I can directly used the hmm file constructed by Szatkiewicz et al. on Penncnv website? Many thanks.

ghost commented 7 years ago

By the way, I create the gcmodel file according to the instruction on the website(described below), but it doesn't work and raises me an error like "Error: input to reg_linear() should be two reference to arrays".
1) Create the map file using the final report generated by GenomeStudio; 2) Calculate the GC content in each SNP sites(plus/minus 500kb) using the a house-made script. I have filtered the SNPs located in chr0, chrMT. Is this the reason leading to this error? Do you know how to fix this problem? Thank you!

kaichop commented 7 years ago

Yes

On Mon, Mar 27, 2017 at 2:30 PM, swang notifications@github.com wrote:

Hi Dr Wang, I'm now working on the array data using illumina omniexpressome platform. May I know if I can directly used the hmm file constructed by Szatkiewicz et al. on Penncnv website? Many thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-289542734, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuMzt7dMUd-fI742SCe6ISh8T_TeGks5rqAA4gaJpZM4MX8AL .

kaichop commented 7 years ago

It seems to be a format problem. You may want to check whether the format looks identical as gcmodel file supplied in penncnv. In addition, you did not show any command so I cannot tell exactly what you have done when you see the error.

On Mon, Mar 27, 2017 at 2:38 PM, swang notifications@github.com wrote:

By the way, I create the gcmodel file according to the instruction on the website(described below), but it doesn't work and raises me an error like "Error: input to reg_linear() should be two reference to arrays".

  1. Create the map file using the final report generated by GenomeStudio;
  2. Calculate the GC content in each SNP sites(plus/minus 500kb) using the a house-made script. I have filtered the SNPs located in chr0, chrMT. Is this the reason leading to this error? Do you know how to fix this problem? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-289545151, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuBzGkFSA-AIn331T93906Yf0Ty8Fks5rqAIngaJpZM4MX8AL .

ghost commented 7 years ago

@kaichop Thank you. Really appreciate your help. I have learnt from PennCNV website about calling CNVs from trio families. But when I finished the both modes in one trio family, there's a big difference in CNV number. After running the commands according to the example, 818 CNVs were detected using trio mode while 2312 CNVs using joint mode. Among them, 711 CNVs were detected in both trio mode and joint mode. Could you let me know which set of results I can use in my following analysis? I noticed in the PennCNV website, it said the algorithm used in joint mode can be found in a NAR paper, could you let me know which paper it is? Thank you!

kaichop commented 7 years ago

either can be used. joint mode is more sensitive so smaller CNVs will be found. Reference is in home page.

On Tue, Mar 28, 2017 at 7:13 PM, swang notifications@github.com wrote:

@kaichop https://github.com/kaichop Thank you. Really appreciate your help. I have learnt from PennCNV website about calling CNVs from trio families. But when I finished the both modes in one trio family, there's a big difference in CNV number. After running the commands according to the example, 818 CNVs were detected using trio mode while 2312 CNVs using joint mode. Among them, 711 CNVs were detected in both trio mode and joint mode. Could you let me know which set of results I can use in my following analysis? I noticed in the PennCNV website, it said the algorithm used in joint mode can be found in a NAR paper, could you let me know which paper it is? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-289933349, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuCgJOZnf9PLzCWsM9dKY06NCoOCIks5rqZQggaJpZM4MX8AL .

zy8281263 commented 6 years ago

@kaichop Hi Dr. Wang. I have a intensity file from gsa array, i want to know which .hmm file can i use or how can i generate a hmm file?

kaichop commented 6 years ago

hhall.hmm can be used.

On Wed, Jan 3, 2018 at 1:46 AM, zy8281263 notifications@github.com wrote:

@kaichop https://github.com/kaichop Hi Dr. Wang. I have a intensity file from gsa array, i want to know which .hmm file can i use or how can i generate a hmm file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/13#issuecomment-354946226, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuIUVAOaUqYJ4BcB_lk38JcyJpYLXks5tGyIvgaJpZM4MX8AL .