I have been playing with MOFA and MEFISTO, and I found some inconsistency in the way the feature weights are scaled when using the function get_weights(). Here is a reproducible example:
Which means that the weights are scaled independently for each view. If on the other hand as.data.frame is TRUE, then the scaling is performed at once on the whole data-frame, so across all views and factors:
I noted that the scaled weights obtained with as.data.frame = TRUE and as.data.frame = FALSE are identical for features from view 1 since it's the view that has the highest absolute weight, which is used for scaling the entire dataset when as.data.frame = TRUE:
I personally find it more useful if the weights are scaled by factor rather than by view, as it allows me to see which features are contributing the most to a given factor across all views. But there are times where scaling by view is very useful as well.
Maybe that could become an option? e.g. the scaling parameter could be 'none', 'by_view', 'by_factor', or even 'overall'. I made a custom version of the get_weights() function with this in mind:
get_weights_custom <- function (object, views = "all", factors = "all", abs = FALSE, scale = 'none', as.data.frame = FALSE){
if (!is(object, "MOFA"))
stop("'object' has to be an instance of MOFA")
if(!(scale %in% c('none', 'by_view', 'by_factor', 'overall'))) stop("scale should be one of: 'none', 'by_view', 'by_factor', 'overall'.")
views <- MOFA2:::.check_and_get_views(object, views)
factors <- MOFA2:::.check_and_get_factors(object, factors)
## Get the raw weights as a long data-frame by default,
## and filter the relevant views and factors
weights <- get_expectations(object, "W", as.data.frame = TRUE) %>%
dplyr::filter(view %in% views,
factor %in% factors)
## Transform to absolute value if needed
if(abs) weights$value <- abs(weights$value)
## If some scaling must be performed
if(scale != 'none'){
if(scale == "by_view") weights <- dplyr::group_by(weights, view)
if(scale == "by_factor") weights <- dplyr::group_by(weights, factor)
## When scaling by view or factor, the grouping ensures that the maximum
## value is selected within the relevant group (i.e. either view or factor)
## otherwise if no grouping is performed (when scale = 'overall'), takes
## the max absolute value across the entire dataset.
weights <- weights %>%
dplyr::mutate(value = value / max(abs(value))) %>%
dplyr::ungroup()
}
## If we want to return a list, transform the long data-frame as a list
## of tibbles (can add as.data.frame() at the end if we don't want a tibble)
if(!as.data.frame){
weights <- purrr::map(views,
~ weights %>%
dplyr::filter(view == .x) %>%
tidyr::pivot_wider(names_from = factor,
values_from = value) %>%
dplyr::select(-view)
)
names(weights) <- views
}
return(weights)
}
Hi,
I have been playing with MOFA and MEFISTO, and I found some inconsistency in the way the feature weights are scaled when using the function
get_weights()
. Here is a reproducible example:If I extract the weights as a list with
scale = TRUE
, I get:While if I extract the scaled weights as a data-frame with
as.data.frame = TRUE
, I get:From what I can tell, this is because in the
get_weights()
function, ifas.data.frame
is FALSE, the scaling is performed as follows:Which means that the weights are scaled independently for each view. If on the other hand
as.data.frame
is TRUE, then the scaling is performed at once on the whole data-frame, so across all views and factors:I noted that the scaled weights obtained with
as.data.frame = TRUE
andas.data.frame = FALSE
are identical for features from view 1 since it's the view that has the highest absolute weight, which is used for scaling the entire dataset whenas.data.frame = TRUE
:Possible solution?
I personally find it more useful if the weights are scaled by factor rather than by view, as it allows me to see which features are contributing the most to a given factor across all views. But there are times where scaling by view is very useful as well.
Maybe that could become an option? e.g. the scaling parameter could be
'none'
,'by_view'
,'by_factor'
, or even'overall'
. I made a custom version of theget_weights()
function with this in mind:And now the scaling is consistent:
I would love to use something like this for
plot_weights()
andplot_top_weights
as well.Hope that makes sense, and thanks for your help!