HCBravoLab / metagenomeSeq

Statistical analysis for sparse high-throughput sequencing
66 stars 20 forks source link

MRcoefs/table/fulltable: eff='x' arg not effective? #47

Open handibles opened 7 years ago

handibles commented 7 years ago

Greetings to the devs.

From the documentation:

"We recommend removing features with less than the average number of effective samples in all features. In essence, setting eff = .5 when using MRcoefs, MRfulltable, or MRtable"

I've tried a number of values for the eff='x' argument, which should filter out features represented in less than 'x' samples: however the output of MRcoef etc. shows features with any number (1 and up) of effective samples.

Additionally, I've tried excising features with less than my threshold number of samples:

z.rm <- which (calculateEffectiveSamples ( mgs_object ) < 6 )
mgs_subset <- mgs_object [ -z.rm, ]

This reduces the number of features, and accordingly calculateEffectiveSamples on the respective fit shows values above my threshold - however, when I export this object via MRfulltable, I again have features (700 out of 1700) with +samples below the threshold (+samples <6).

This is reproducible (but to a lesser extent) with the lungData set also, running the code as per page 16 and then calling: write.table(MRfulltable(lungres1, **eff=0.5**, number=2000), 'lung.tsv', sep='\t' )
This gives features with sample counts from 30 upwards, although the lungData set has 78 samples.

The only thing I can think of (other than being plain wrong) is that standardisation via ZIG is bringing some border-line abundances below 0, leaving those samples less effective?

alopgar commented 5 years ago

Hello. I just ran into the same issue and saw this open post.

When I use the MRfulltable function with that eff = .5 argument, the resultin table has all my features.

But when I use this code: ZIG <- fitZig(obj = normtest, mod = mod, control = settings, useCSSoffset = F) efec <- calculateEffectiveSamples(ZIG) valid = which(efec >= quantile(efec, p = 0.5, na.rm = TRUE)) The resulting output filters me around half of the samples.

Is there any clue about what could be happening? Thanks in advance.

jeffkimbrel commented 2 years ago

It looks like the code is doing what you wrote above and writing to valid: https://github.com/HCBravoLab/metagenomeSeq/blob/df8a28214fa9cb25870dee0e5cc909c160ce8da2/R/MRfulltable.R#L68

But then just overriding the valid filtering that is done here?: https://github.com/HCBravoLab/metagenomeSeq/blob/df8a28214fa9cb25870dee0e5cc909c160ce8da2/R/MRfulltable.R#L106

Seems like a bug to me?