VUmcCGP / wisecondor

WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
Other
44 stars 65 forks source link

create referece #34

Closed chaizuolin closed 6 years ago

chaizuolin commented 7 years ago

Hello ,I created the reference with 1000 samples. Why is it all gray regions. image

dridk commented 7 years ago

Chromosom 13,14,15 and 21,22 are acrocentric . That means there are no DNA content in the short arm. centromeric region are also gray without dna content.

chaizuolin commented 7 years ago

What I mean is. I created the reference with 1000 samples. Why are all chromosomes Unmappable region. image but ,I created the reference with 700 samples ,come out image Created the reference is the more samples the better.How many samples are you recommended to create

dvanbeek commented 7 years ago

Hi chaizuolin,

I vaguely remember seeing such a plot before. If I remember correctly the reference in this case was build using the raw .pickle files instead of the GC-corrected ones (.gcc). Normally the flow of building a WISECONDOR reference is like this (also described in the wiki):

  1. .bam -> .pickle, using: samtools rmdup -s sample.bam - | samtools view - -q 1 | python consam.py -outfile sample.pickle
  2. .pickle -> .gcc, using: python ./gcc.py ./in/sample.pickle ./ref/gccount ./in/sample.gcc
  3. All .gcc files of the reference samples in one directory (in line below ./in/refs/) -> one new reference set file (in line below ./ref/reference), using: python newref.py ./in/refs/ ./ref/reference

Could you double check if you indeed performed step 2?

chaizuolin commented 7 years ago

Good, Thank you very much. How many samples do you recommend to create the reference and how much data size is in each sample.

dvanbeek commented 7 years ago

Hi chaizuolin,

Did my previous comment solve your issue with the reference set of 1000 samples?

Regarding your follow up question, please see the wiki, sections What do I put in for reference data? and How many reads do I need for analysis?.

It seems to me that you are using the legacy version of WISECONDOR, did you also try the newest version (master)? The new version should allow you to look more closely at the signal of the samples (we've seen a higher resolution when using the new version). Because of your large number of samples, you should definitely see an improvement. Running WISECONDOR is slightly different when using the new version, please let me know if you need additional help.

chaizuolin commented 7 years ago

Hello. I see "How many reads do I need for analysis?" inside the wiki. But I can't find " How many reads do I need for analysis?" The RETRO filter parameter mentioned in this article

dvanbeek commented 7 years ago

Hi chaizuolin,

You can find and set the parameters for the RETRO filter in the consam.py file: -retdist and -retthres. Default value of both parameters is 4. In the Supplementary data of the WISECONDOR paper you can read up on the way the RETRO filter works (and find details about the parameters).

Did the information in this issue help solve your problem with the gray regions?