ByrumLab / proteoDA

GNU General Public License v3.0
12 stars 11 forks source link

Allow zeros in S3 object? #135

Closed tjthurman closed 1 year ago

tjthurman commented 1 year ago

Currently, our zero/missing value handling is a little UAMS-specific: the read_DIA_data function reads in our MaxQuant data, in which missing values are encoded as the text string "Missing Value". Within read_DIA_data, those missing values get converted to 0. Then, later on in the protein filtering functions, any 0 values get converted to NAs. This could potentially cause issue both for us (if we imported data but then didn't filter it for some reason, we'd have 0s instead of NAs when running limma) and for our external users of the package.

Would like to get other people's opinion on this, but my first thought is that we should probably just not allow 0s in the data of our DIAlist. When I new DIAlist is created (either by our internal UAMS functions, or by the functions we'll give to external users), and 0 values should be converted to NA with a warning. Then, our S3 object validator should check for values of 0 and throw an error if it finds any.

sbyrum21 commented 1 year ago

The "missing value" comes from Scaffold DIA exported Samples report file and not MaxQuant. The missing values by default should be "NA".

In some situations, knockout experiments, I need the "0" in the matrix so that a 0 is included in the fold change calculation.

For the majority of projects, we want "NA" so limma using the na.omit function to remove the sample from the fold change calculation. Therefore, they are missing at random and ignores them in the calculation.

tjthurman commented 1 year ago

OK, for our internal read_DIA_data() function, I'll convert the "Missing Value" to NA when everything is imported (instead of 0, as it currently is), so we don't have to rely on the filtering functions to convert 0s to NAs.

Relatedly, then: I guess we shouldn't automatically convert 0s to NAs in our filtering functions, as we currently do? Instead external users can manage whether they use 0s or NAs based on the type of project they have (and maybe we could provide some helper functions to convert NAs to 0s and vice-versa)?

sbyrum21 commented 1 year ago

Yes, this was on my to-do list to make a parameter to choose 0 or NA. I forgot about it until this issue was posted.

tjthurman commented 1 year ago

OK, I'll close this issue and branch it off into two new issues that deal with the specific changes we need to make.