YuanTian1991 / ChAMP

21 stars 23 forks source link

Different result from champ.filter() function with the same dataset #46

Open thscandolara opened 8 months ago

thscandolara commented 8 months ago

Greetings everyone,

Was there any change in champ.filter() function? I had a dataset from TCGA that had theses results last year:

#champ.filter function: In this step, these probes can be filtered out: NoCG, SNPs start, MultiHit start, XY start..
met <- champ.filter(beta = beta_matrix,
                      pd = clinical,
                      filterXY = T,
                      filterNoCG = T,
                      filterSNPs = T,
                      filterMultiHit = T,
                      population = NULL,
                      filterDetP = TRUE)

#[ Section 2: Filtering Start >>
#Filtering NoCG Start
#Only Keep CpGs, removing 1375 probes from the analysis.
#
#Filtering SNPs Start
#Using general 450K SNP list for filtering.
#Filtering probes with SNPs as identified in Zhou's Nucleic Acids Research Paper 2016.
#    Removing 881 probes from the analysis.
#
#  Filtering MultiHit Start
#    Filtering probes that align to multiple locations as identified in Nordlund et al
#    Removing 10 probes from the analysis.
#
#  Filtering XY Start
#    Filtering probes located on X,Y chromosome, removing 6831 probes from the analysis.
#
#  Updating PD file
#    filterDetP parameter is FALSE, so no Sample Would be removed.
#
#  Fixing Outliers Start
#    Replacing all value smaller/equal to 0 with smallest positive value.
#    Replacing all value greater/equal to 1 with largest value below 1..
#[ Section 2: Filtering Done ]
#
# All filterings are Done, now you have 345347 probes and 130 samples.

Due to a problem with our saved files, we had to re-run all analyses. But now I have a different output:

#champ.filter function: In this step, these probes can be filtered out: NoCG, SNPs start, MultiHit start, XY start..
met <- champ.filter(beta = beta_matrix,
                    pd = clinical,
                    filterXY = T,
                    filterNoCG = T,
                    filterSNPs = T,
                    filterMultiHit = T,
                    population = NULL,
                    filterDetP = TRUE)
##[ Section 2: Filtering Start >>
#Filtering NoCG Start
#Only Keep CpGs, removing 1373 probes from the analysis.
#
#[ Section 2: Filtering Start >>
#Filtering NoCG Start
#Only Keep CpGs, removing 1414  probes from the analysis.
#
#Filtering SNPs Start
#Using general 450K SNP list for filtering.
#Filtering probes with SNPs as identified in Zhou's Nucleic Acids Research Paper 2016.
#Removing 904  probes from the analysis.
#
# Filtering MultiHit Start
#    Filtering probes that align to multiple locations as identified in Nordlund et al
#    Removing 10 probes from the analysis.
#
#  Filtering XY Start
#    Filtering probes located on X,Y chromosome, removing 7080  probes from the analysis.
#
#  Updating PD file
#    filterDetP parameter is FALSE, so no Sample Would be removed.
#
#  Fixing Outliers Start
#    Replacing all value smaller/equal to 0 with smallest positive value.
#    Replacing all value greater/equal to 1 with largest value below 1..
#[ Section 2: Filtering Done ]
#
# All filterings are Done, now you have 349956 probes and 130 samples.

It is the same input, but I have no idea what changed the output because I do not have the previous files to check it. Has anything changed in this filtering step?

This is kind of important because it really affected all other analyses and even some sample clustering..

Many thanks in advance!