Closed laurapyle closed 10 months ago
Hi @laurapyle
Firstly, can you tell me which version of SomaDataIO
you are using? CRAN v6.0.0
?
Second, Would it be possible to generate a small example that reproduces the error you're experiencing?
It doesn't have to be the exact example you're working with, perhaps using the example_data10
dataset that comes with SomaDataIO
and grouping (Sex
?) with dplyr::group_by()
.
All of the dplyr
S3 methods to a check under the hood using is_intact_attr()
, which is where the error is coming from, to ensure that the attributes of the soma_adat
object are not corrupted by the dplyr
verb. It appears they are here, just not sure exactly when/where this happens. It's possible that a recent change in dplyr
has introduced this behavior.
Additionally, with the filter()
method is being invoked somewhere (your error), which is a bit confusing. All the more why a reprex would be helpful.
Side note: there are Math.generic
S3 methods for the soma_adat
class, so logging functions should/can be performed easily (as long as the class is maintained). I would not recommend applying your own custom log-transform function (since the generics exist for this purpose - and have build in checks for edge-case guarding and robustness).
Thank you for submitting an issue ... hopefully we can get this resolved (and fixed if a bug exists).
Thanks for your quick reply! I am using SomaDataIO 6.0.0. Here is a reproducible example:
library(dplyr)
library(SomaDataIO)
test <- example_data %>% arrange(SampleId)
test <- test %>% dplyr::group_by(SampleId)
test <- test %>% dplyr::filter(row_number()==1)
How would I use the generic s3 methods to log transform the soma_adat object?
Thanks for the example ...
log
-transform is simple with the use of our Math generics:
apt <- "seq.3381.24" # chosen at random
median(example_data[[apt]])
new <- log10(example_data)
median(new[[apt]])
See also: https://somalogic.github.io/SomaDataIO/reference/groupGenerics.html
In the meantime ... can you shed some light on what you are trying to do? I'm thinking there is likely a workaround that doesn't involve dplyr::group_by()
.
For example, grouping by SampleId
is a little unusual ... typically this most useful for SampleType
, but I realize this may just be your dummy example.
Thank you for your reprex
... I was able to reproduce your error.
Note for dev ...
the actual bug is here:
gr_df <- dplyr::group_by(example_data, SampleType)
class(gr_df)
class(gr_df[, -1L])
Behavior is coming from [.soma_adat()
, which isn't acting as expected on a "grouped_df" object. Direct call is coming from rn2col()
, which uses the [.soma_adat()
extraction method.
I have a dataset with 2 samples per person at different study visits, and I want to select the first visit for each person, so I arrange by ID and date drawn and then take the first visit. There probably is a way that I can work around this, but I have quite a few separate analyses that use similar logic which would all need to be modified, so I was trying to understand what caused this change in behavior.
Interesting ... I'm not sure there has been a "change" in behavior. The relevant code hasn't changed since well before SomaDataIO was released on CRAN, quite a while actually. The main offender is is_intact_attr()
which is called inside dplyr::filter.soma_adat()
, but that's actually a red-herring, the real problem is a few steps above where the rownames are preserved.
However, I do think this needs attention either way, since a group_by() |> filter()
workflow is fairly common.
That is very interesting! I've been running this code repeatedly without any errors for over a year, so I assumed something had changed. I am not sure why I started getting an error about a week ago. Although it does explain why I couldn't fix the problem by reverting to older package versions.
Hmmm. The issue is with the grouped_df
methods for the dplyr
verbs. They are called under the hood indirectly by NextMethod()
in any of the verb methods. It's possible those methods have changed and cascaded into our code base that way. Though I still cannot explain why you're suddenly seeing it now unless you were using a different version of dplyr
maybe?
I don't think that I was using a different version of dplyr although I could be mistaken. I also tried reverting to earlier versions of dplyr and that didn't fix the problem.
Actually, I was able to check the history of updates for dplyr and I believe I was using a different version previously. I also don't think that I had successfully reverted to a prior version of dplyr - it couldn't be unloaded because it had been imported by tidyr. So, it's possible that the change in dplyr version is what caused the issue.
Either way, it's worth fixing so that future dplyr
changes don't break our class. So thank you for bringing it to my attention.
Hello,
I am rerunning some code that uses SomaDataIO that I haven't run in several months and it does not seem to work as before. I think the issue has to do with dplyr verbs. When I use dplyr::group_by() followed by filter() on a soma_adat, I get the following message:
The object is not a 'soma_adat' class objects: 'grouped_df', 'tbl_df', 'tbl', 'data.frame' Error in filter.soma.adat(., row_number()=1): is_intact_attr(.data) is not TRUE.
I figured out a work-around for that issue, but I applied a custom function to log transform the proteins and got the same message repeated over and over: "Attributes has only 3 entries: 'names', 'row.names', 'class.'
I have searched closed issues on github and found some similar issues, but most of them were quite old and marked as complete. I tried reverting to dplyr version 1.0.6 per #15 but got an error when trying to load SomaDataIO, which needed at least version 1.0.10 of dplyr.
I am not sure why this has happened suddenly - thanks in advance for any help!
Laura Pyle