Bug in function `zero_to_missing()`

Klorator commented 8 months ago

Hi,

I think I've found a bug in the zero_to_missing() function. R version: 4.3.2 proteoDA version: 0.0.0.9002

I'll just post screenshots and what I think the issue is. Hopefully that is enough.

Using the browser() I see the following:

Inside zero_to_missing(), the first instance of validate_DAList() passes without errors.
After imputing NAs, the row names in new_data have been dropped.
This leads to the second instance of validate_DAList() failing. Giving me the error !Rownames for the `data` and `annotation` slots of the DAList must match

So I take it that using the [ ] intrinsically drops row names. If so, potential easy fix would be to also pull out the row names and adding them back in after the operation.

Is this only an issue for me because I have tibbles instead of pure base R data frames, or something like that?

Klorator commented 8 months ago

My interim solution is this and seems to work.

  # Convert missing values
  # raw_proteoDA <- proteoDA::zero_to_missing(raw_proteoDA)
  new_data <- DAList$data # Pull data
  new_data_rownames <- row.names(new_data) # Pull row names
  new_data[new_data == 0] <- NA # Impute NAs
  row.names(new_data) <- new_data_rownames # Add row names back in
  DAList$data <- new_data # Replace old data with new data

tjthurman commented 8 months ago

Thanks for filing a bug report!

Could you provide a reproducible example with some data that triggers the error? It seems like you've tracked down the issue and it should be a quick fix. But I'm not sure why the new_data[new_data == 0] <- NA line is dropping row names for you. Might be tibbles vs regular data frames, but the data within a DAList should be getting converted to data frames, so we might need to add another check/coercion somewhere else in the code too.

Klorator commented 8 months ago

Not sure how to best share a reproducible example. I'm trying to use this in an automated pipeline and I'm new to the whole sharing code and asking questions thing. Things are kinda messy on my end at the moment...

I imagine converting a data frame to a tibble and then constructing the DAList obj should be a way to reproduce the error. Hopefully that is something you can easily check for using an existing unit test.

After running DAList(), class(DAList$data) gives me [1] "tbl_df" "tbl" "data.frame" as do all three data frames that I feed into DAList()

I also get two warning messages with Setting row names on a tibble is deprecated. from running DAList().

So, yeah. Maybe the better solution would be an extra coercion in DAList() that scrubs away the tbl_df attributes if present.

tjthurman commented 8 months ago

No worries on the reproducible example. From the error messages and class info you've sent, I think your diagnosis is correct, and the issue in zero_to_missing() is downstream of the fact that the DAList() constructor isn't converting your input tibbles to data frames. Shouldn't be hard to fix, but with the holidays it might take me a day or two to put a fix in and test everything.

In the meantime, sounds like you've been able to work around it? If you run into other errors further down the pipeline, might be due to the annotation and metadata also being tibbles instead of data frames. The other workaround would be to just add an as.data.frame() coercion around the three tibbles that you're passing in to DAList(), if that's something you have access to in your automated pipeline.

Klorator commented 8 months ago

Thank you! I'll see what happens and the as.data.frame() solution is possible to attempt.

Enjoy your holidays and don't stress this for my sake.

ByrumLab / proteoDA

Bug in function `zero_to_missing()` #251