WGLab / PennCNV

Copy number vaiation detection from SNP arrays
http://penncnv.openbioinformatics.org
Other
89 stars 55 forks source link

PennCNV affy X chromosome missing #16

Closed jongleur2056 closed 6 years ago

jongleur2056 commented 7 years ago

Hi There, I was following PennCNV affy tutorial, after "Step 2: Split the signal file into individual files for CNV calling by PennCNV", I did "wc -l file.split1" to check the number of lines and found all the individual files had 1,401,380 lines instead of around 1.8 million lines for Affy 6.

I then used "tail file.split1" and found the last line was CN_922408 22 49578524 -0.0502 2 and I did not see X chromosome.

I used "wc -l gw6.lrr_baf.txt" and found there were only 1,401,380 lines too. What could I have done wrong that lost about 400,000 probsets for each sample?

kaichop commented 7 years ago

what array is this? Are you sure the array contains it, and the annotation file contains chrX markers?

If you check the original apt output, do you see chrX markers? We want to see whether this is a apt problem, or a penncnv-affy problem.

On Fri, Sep 15, 2017 at 12:44 AM, jongleur2056 notifications@github.com wrote:

Hi There, I was following PennCNV affy tutorial, after "Step 2: Split the signal file into individual files for CNV calling by PennCNV", I did "wc -l file.split1" to check the number of lines and found all the individual files had 1,401,380 lines instead of around 1.8 million lines for Affy 6.

I then used "tail file.split1" and found the last line was CN_922408 22 49578524 -0.0502 2 and I did not see X chromosome.

I used "wc -l gw6.lrr_baf.txt" and found there were only 1,401,380 lines too. What could I have done wrong that lost about 400,000 probsets for each sample?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/16, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuEp9bFwHzjHiGw70ZWfxmBytcgalks5sigDIgaJpZM4PYe7t .

jongleur2056 commented 7 years ago

Thanks for your quick response. I used Affymetrix genome wide microarray SNP 6.0. And I copied the following five commands for Genome-wide 6.0 array in Steps 1.1 1.2 1.3 1.4 and 2 from PennCNV affy tutorial. I assume the annotation file should be GenomeWideSNP_6.cdf? Which original apt output file should I check, "birdseed.calls,txt", "birdseed.confidences.txt", or "quant-norm.pm-only.med-polish.expr.summary.txt"? Many thanks.

[kai@cc ~/]$ apt-probeset-genotype -c lib/GenomeWideSNP_6.cdf -a birdseed --read-models-birdseed lib/GenomeWideSNP_6.birdseed.models --special-snps lib/GenomeWideSNP_6.specialSNPs --out-dir apt --cel-files listfile

[kai@cc ~/]$ apt-probeset-summarize --cdf-file lib/GenomeWideSNP_6.cdf --analysis quant-norm.sketch=50000,pm-only,med-polish,expr.genotype=true --target-sketch lib/hapmap.quant-norm.normalization-target.txt --out-dir apt --cel-files listfile

[kai@cc ~/]$ generate_affy_geno_cluster.pl birdseed.calls.txt birdseed.confidences.txt quant-norm.pm-only.med-polish.expr.summary.txt -locfile ../lib/affygw6.hg18.pfb -sexfile file_sex -out gw6.genocluster

[kai@cc ~/]$ normalize_affy_geno_cluster.pl gw6.genocluster quant-norm.pm-only.med-polish.expr.summary.txt -locfile ../lib/affygw6.hg18.pfb -out gw6.lrr_baf.txt

kcolumn.pl gw6.lrr_baf.txt split 2 -tab -head 3 -name -out gw6

jongleur2056 commented 7 years ago

Hi, I came across the following paper claiming that "Specifically, data for chromosome X and Y were not shown because PennCNV-Affy didn’t carry sex chromosome information". Is this true that PennCNV-affy will omit chrX?

BMC Bioinformatics. 2014. Evaluation of copy number variation detection for a SNP array platform. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-50

kaichop commented 7 years ago

Yes those are the output files for APT. you can check whether chrX SNPs are there. They should be.

On Fri, Sep 15, 2017 at 9:56 AM, jongleur2056 notifications@github.com wrote:

Thanks for your quick response. I used Affymetrix genome wide microarray SNP 6.0. And I copied the following five commands for Genome-wide 6.0 array in Steps 1.1 1.2 1.3 1.4 and 2 from PennCNV affy tutorial. I assume the annotation file should be GenomeWideSNP_6.cdf? Which original apt output file should I check, "birdseed.calls,txt", "birdseed.confidences.txt", or "quant-norm.pm-only.med- polish.expr.summary.txt"? Many thanks.

[kai@cc ~/]$ apt-probeset-genotype -c lib/GenomeWideSNP_6.cdf -a birdseed --read-models-birdseed lib/GenomeWideSNP_6.birdseed.models --special-snps lib/GenomeWideSNP_6.specialSNPs --out-dir apt --cel-files listfile

[kai@cc ~/]$ apt-probeset-summarize --cdf-file lib/GenomeWideSNP_6.cdf --analysis quant-norm.sketch=50000,pm-only,med-polish,expr.genotype=true --target-sketch lib/hapmap.quant-norm.normalization-target.txt --out-dir apt --cel-files listfile

[kai@cc ~/]$ generate_affy_geno_cluster.pl birdseed.calls.txt birdseed.confidences.txt quant-norm.pm-only.med-polish.expr.summary.txt -locfile ../lib/affygw6.hg18.pfb -sexfile file_sex -out gw6.genocluster

[kai@cc ~/]$ normalize_affy_geno_cluster.pl gw6.genocluster quant-norm.pm-only.med-polish.expr.summary.txt -locfile ../lib/affygw6.hg18.pfb -out gw6.lrr_baf.txt

kcolumn.pl gw6.lrr_baf.txt split 2 -tab -head 3 -name -out gw6

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/16#issuecomment-329790209, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuCJZMd8a9vgNtUeX-BHngV8-BnXNks5sioImgaJpZM4PYe7t .

kaichop commented 7 years ago

this is incorrect, penncnvaffy can do calls on chrX and will not omit chrX. (by default, detect_cnv.pl requires -chrx argument to call CNVs on chromosome X, but of course the PFB file and the input file must also contain chrX information).

On Fri, Sep 15, 2017 at 3:27 PM, jongleur2056 notifications@github.com wrote:

Hi, I came across the following paper claiming that "Specifically, data for chromosome X and Y were not shown because PennCNV-Affy didn’t carry sex chromosome information". Is this true that PennCNV-affy will omit chrX?

BMC Bioinformatics. 2014. Evaluation of copy number variation detection for a SNP array platform. https://bmcbioinformatics.biomedcentral.com/articles/10. 1186/1471-2105-15-50

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/16#issuecomment-329877533, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuP-lFXKlss7tQam0YB8glNzSEnroks5sis-WgaJpZM4PYe7t .

jongleur2056 commented 7 years ago

Array: Affymetrix genome wide SNP 6.0 annotation file: affygw6.hg19.pfb contains 12839 chrX probesets

apt output files: "birdseed.calls.txt" has 909623 probesets and contains 11553 chrX probesets. "birdseed.confidences.txt" has 934968 probesets and contains 12047 chrX probesets. "quant-norm.pm-only.med-polish.expr.summary.txt" has 1048464 probesets and contains 0 chrX probesets.

I used the following command (step 1.2) to generate quant-norm.pm-only.med-polish.expr.summary.txt file: [kai@cc ~/]$ apt-probeset-summarize --cdf-file lib/GenomeWideSNP_6.cdf --analysis quant-norm.sketch=50000,pm-only,med-polish,expr.genotype=true --target-sketch lib/hapmap.quant-norm.normalization-target.txt --out-dir apt --cel-files listfile

Which step could go wrong that generated 0 chrX probesets in quant-norm.pm-only.med-polish.expr.summary.txt file?

kaichop commented 7 years ago

I cannot tell but you should check their manual https://www.affymetrix.com/support/developer/powertools/changelog/apt-probeset-summarize.html to figure this out. Again I did not develop APT and cannot advise why. It seems that the genotype method requires special parameters for chrx https://www.affymetrix.com/support/developer/powertools/changelog/VIGNETTE-WGSA-special-snps.html

On Sun, Sep 17, 2017 at 9:44 PM, jongleur2056 notifications@github.com wrote:

Array: Affymetrix genome wide SNP 6.0 annotation file: affygw6.hg19.pfb contains 12839 chrX probesets

apt output files: "birdseed.calls.txt" has 909623 probesets and contains 11553 chrX probesets. "birdseed.confidences.txt" has 934968 probesets and contains 12047 chrX probesets. "quant-norm.pm-only.med-polish.expr.summary.txt" has 1048464 probesets and contains 0 chrX probesets.

I used the following command (step 1.2) to generate quant-norm.pm-only.med-polish.expr.summary.txt file: [kai@cc ~/]$ apt-probeset-summarize --cdf-file lib/GenomeWideSNP_6.cdf --analysis quant-norm.sketch=50000,pm-only,med-polish,expr.genotype=true --target-sketch lib/hapmap.quant-norm.normalization-target.txt --out-dir apt --cel-files listfile

Which step could go wrong that generated 0 chrX probesets in quant-norm.pm-only.med-polish.expr.summary.txt file?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/16#issuecomment-330109957, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuNFc_EttB_r7fwHWC1aEaUgV03gkks5sjcrrgaJpZM4PYe7t .