WGLab / PennCNV

Copy number vaiation detection from SNP arrays
http://penncnv.openbioinformatics.org
Other
89 stars 53 forks source link

PFB missing markers #45

Open andrisvAria opened 5 years ago

andrisvAria commented 5 years ago

Hi, I have generated a PFB file for Illumina GSA from 1000 sample. Illumina GSA has approximately 600 000 markers. When the PFB is compiled, it only has 123 000. Then when I call detect_cnv I get this in the log: NOTICE: Done with 122965 records in 24 chromosomes (495575 records are discarded due to lack of PFB information for the markers)

I wonder why I don't get the entire set of markers in the PFB. Is there something missing in my data or is this expected behavior? Thanks

kaichop commented 5 years ago

You must use the PFB file for the GSA array. Any marker not annotated in your PFB file will not be used in the analysis. You did not show your command so I cannot tell.

On Fri, Jun 21, 2019 at 6:48 AM andri90 notifications@github.com wrote:

Hi, I have generated a PFB file for Illumina GSA from 1000 sample. Illumina GSA has approximately 600 000 markers. When the PFB is compiled, it only has 123 000. Then when I call detect_cnv I get this in the log: NOTICE: Done with 122965 records in 24 chromosomes (495575 records are discarded due to lack of PFB information for the markers)

I wonder why I don't get the entire set of markers in the PFB. Is there something missing in my data or is this expected behavior? Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/PennCNV/issues/45?email_source=notifications&email_token=ABNG3OAAIRVMXQKB4HATAUDP3SWWPA5CNFSM4H2QIKNKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G25325A, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNG3OCWW4PNXX7E72JSCATP3SWWPANCNFSM4H2QIKNA .

andrisvAria commented 5 years ago

This is the command I used to compile my pfb:

compile_pfb.pl -listfile listfile.txt -output out.pfb -snpposfile C:\Users\andsav\PennCNV-1.0.5\snpPositions.txt

Listfile.txt has the following format, with 1000 different samples. These were produced using illumina_split_report.pl llistfil

snpPositions.txt has the following format with 618 540 markers snpPos

What is weird is that if i do the same thing with half samples, it actually compiles the pfb correctly

P:\data\intensities>compile_pfb.pl -listfile listfile.txt -output out.pfb -snpposfile C:\Users\andsav\Desktop\PennCNV-1.0.5\snpPositions.txt NOTICE: A total of 439 input signal files is specified in P:\data\intensities\listfile.txt NOTICE: Start reading snpposfile C:\Users\andsav\Desktop\PennCNV-1.0.5\snpPositions.txt ... Done with location information for 618540 markers NOTICE: The B Allele Freq information is annotated as column 3 in input files NOTICE: A total of 439 input files will be used for compiling PFB values NOTICE: PFB values for 618540 markers were written to output

P:\data\intensities>compile_pfb.pl -listfile listfile1.txt -output out1.pfb -snpposfile C:\Users\andsav\Desktop\PennCNV-1.0.5\snpPositions.txt NOTICE: A total of 1000 input signal files is specified in P:\data\intensities\listfile1.txt NOTICE: Start reading snpposfile C:\Users\andsav\Desktop\PennCNV-1.0.5\snpPositions.txt ... Done with location information for 618540 markers NOTICE: The B Allele Freq information is annotated as column 3 in input files NOTICE: A total of 1000 input files will be used for compiling PFB values NOTICE: PFB values for 123052 markers were written to output