bioFAM / MOFA2

Multi-Omics Factor Analysis
https://biofam.github.io/MOFA2/
GNU Lesser General Public License v3.0
283 stars 49 forks source link

`correlate_covariates`: redundant conversion to numeric and incorrect warning #152

Open artur-sannikov opened 3 months ago

artur-sannikov commented 3 months ago

In correlate_covariates.R on line 52 the code converts the columns to numeric twice with a warning in between. Line 56 is not required. We can throw a warning and then convert the columns to numeric.

Another issue is that if I run this function on with a covariate data.frame which is numeric, I still get this warning.

Below, I generate some numbers and assign them to a column

nums <- runif(179, 0, 1000)
covariates <- data.frame(nums = nums)
row.names(covariates) <- colData(mae[[2]])[, "sample"]

correlate_factors_with_covariates(object = model,
                                  covariates = covariates
)
Warning: There are non-numeric values in the covariates data.frame, converting to numeric...

I confirm that covars data.frame doesn't only contains numeric values:

which(!sapply(covariates,class)%in%c("numeric","integer"))
> integer(0)

If I run the code from the function, I don't get this warning

nums <- runif(179, 0, 1000)
covariates <- data.frame(nums = nums)
row.names(covariates) <- colData(mae[[2]])[, "sample"]

cols <- which(!sapply(covariates, class)%in%c("numeric","integer"))
if (length(cols>=1)) {
  cols.factor <- which(sapply(covariates,class)=="factor")
  covariates[cols] <- lapply(covariates[cols], as.numeric)
  warning("There are non-numeric values in the covariates data.frame, converting to numeric...")
  covariates[cols] <- lapply(covariates[cols], as.numeric)
}
stopifnot(all(sapply(covariates,class)%in%c("numeric","integer")))

mae is my MultiAssayExperiment object.

artur-sannikov commented 3 months ago

While trying to create a more reproducible example with example data. I followed the checks from line 32.

library(ggplot2)
library(MOFA2)

# Create example data
data <- make_example_data(
  n_views = 2,
  n_samples = 200,
  n_features = 100,
  n_factors = 10
)[[1]]

# Add metadata
N <- ncol(data[[1]])
groups <- c(rep("A", N / 2), rep("B", N / 2))

# Create MOFA object
MOFAobject <- create_mofa(data, groups = groups)

# Prepare MOFA object
MOFAobject <- prepare_mofa(
  object = MOFAobject
)

# Train MOFA
outfile <- file.path(tempdir(), "model.hdf5")
MOFAobject.trained <- run_mofa(MOFAobject, outfile, use_basilisk = TRUE)

# For better readability
model <- MOFAobject.trained

# Assign some metadata to covariates table
nums <- runif(200, 0, 1000)
covariates <- data.frame(nums = nums)
metadata <- samples_metadata(model)
samples <- metadata$sample
rownames(covariates) <- samples_metadata(model)$sample

# Run function
correlate_factors_with_covariates(
  object = model,
  covariates = covars
)
Error in correlate_factors_with_covariates(object = model, covariates = covars) : 
  all(rownames(covariates) %in% samples) is not TRUE

But they are if I follow the checks:

all(rownames(covariates) %in% samples)
[1] TRUE

Not sure about this, but I can open an issue.