WGLab / PennCNV

Copy number vaiation detection from SNP arrays
http://penncnv.openbioinformatics.org
Other
88 stars 53 forks source link

hhall.hg18.pfb and hhall.hg18.gcmodel files not available #63

Open aydzhouyuan opened 3 years ago

aydzhouyuan commented 3 years ago

Dear Kai

I am currently processing a pretty old dataset from Human670-QuadCustom_v1_A array. I first tried to generate the pfb file using my own data (~270 healthy samples and ~ 10K cases). I found the average number of rare CNV (frequency < 1%) for individuals is much higher comparing with UKBB dataset.

I suspect this maybe due to the fact that the dataset is skewed as most are cases, that would influence the pfb file?

I then tired to use the hhall.hg18.pfb and hhall.hg18.gcmodel as Human670 is similar to Human610, from this website: http://penncnv.openbioinformatics.org/en/latest/misc/faq/. it appeared that these files are in the pennyCNV package.

However, when I downloaded all the pennCNV (1.0.0-1.0.5) from the GitHub. either hhall.hg18.pfb or hhall.hg18.gcmodel are not in the lib folder.

Any idea why and are you able to share these two files?

Thank you and I am looking forward to hearing from you soon.

Yuan

kaichop commented 3 years ago

You will need to build the PFB file yourself. Essentially they contains the BAF values for these markers. You can use compile_pfb.pl program to generate this file from your own input files.

On Mon, Dec 7, 2020 at 8:25 PM aydzhouyuan notifications@github.com wrote:

Dear Kai

I am currently processing a pretty old dataset from Human670-QuadCustom_v1_A array. I first tried to generate the pfb file using my own data (~270 healthy samples and ~ 10K cases). I found the average number of rare CNV (frequency < 1%) for individuals is much higher comparing with UKBB dataset.

I suspect this maybe due to the fact that the dataset is skewed as most are cases, that would influence the pfb file?

I then tired to use the hhall.hg18.pfb and hhall.hg18.gcmodel as Human670 is similar to Human610, from this website: http://penncnv.openbioinformatics.org/en/latest/misc/faq/. it appeared that these files are in the pennyCNV package.

However, when I downloaded all the pennCNV (1.0.0-1.0.5) from the GitHub. either hhall.hg18.pfb or hhall.hg18.gcmodel are not in the lib folder.

Any idea why and are you able to share these two files?

Thank you and I am looking forward to hearing from you soon.

Yuan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/63, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBEQRGXOX4BEXNKIETSTV6BZANCNFSM4URIMEYA .

aydzhouyuan commented 3 years ago

Thank you for much for your reply. do I use the healthy samples only or all the samples to generate the pfb file? would that influence the CNV calling?

kaichop commented 3 years ago

From empirical experience, it does not really matter much. If you have more samples, it is fine to combine them together to generate the file.

On Tue, Dec 8, 2020 at 4:37 AM aydzhouyuan notifications@github.com wrote:

Thank you for much for your reply. do I use the healthy samples only or all the samples to generate the pfb file? would that influence the CNV calling?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/63#issuecomment-740504218, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OH4LHTX4G5PV2RECA3STXXVZANCNFSM4URIMEYA .