Closed jrosen48 closed 4 years ago
Point 1: We could change the standard digits
argument to 2. Mplus always reports 3 digits, Mclust as many as R allows.
Point 2: The main problem is that Mplus uses many sets of random starts because mixture models notoriously have convergence issues, and Mclust just uses one set of random starts. You can pass additional starts to Mplus through ...
, but not to Mclust.
Re point 1, that sounds good. Could you make this change when you have the chance?
Re point 2... yeah. We do have some options that can be passed to mclust::Mclust()
to control the estimation (i.e., the number of iterations allowed for the EM algorithm and the convergence tolerance: https://rdrr.io/cran/mclust/man/emControl.html). And, there are different ways that EM can be initialized: https://www.rdocumentation.org/packages/mclust/versions/5.4.1/topics/Mclust. But, I agree that these don't (easily) let us reproduce what MPlus does... and that it's probably not the best use of our time to go down that road. Nevertheless, I'd like to at least show how to pass additional starts (and possibly control other aspects of the estimation) for MPlus. Does that sound aight?
Still to do: show how to pass additional arguments through to mclust/mplus
assigned myself
@cjvanlissa coming back to this; I'm trying to pass additional arguments on to MPlus
, but the ...
argument (or what I'm entering for it) isn't behaving as I expected, e.g.:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyLPA)
#> tidyLPA has received a major update, with a much easier workflow and improved functionality. However, you might have to update old syntax to account for the new workflow. See vignette('introduction-to-major-changes') for details!
#>
#> Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
iris_sample <- iris[c(1:4, 51:54, 101:104), ] # to make example run more quickly
iris_sample %>%
subset(select = c("Sepal.Length", "Sepal.Width",
"Petal.Length")) %>%
estimate_profiles(3, package = "mplus", "ANALYSIS: starts = 100,20;")
#> The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
#> Warning in Ops.factor(mn, 2): '%%' not meaningful for factors
#> Warning in Ops.factor(mn, 2): '/' not meaningful for factors
#> Warning in runModels(target = modelout, Mplus_command = Mplus_command, logFile = NULL): Mplus returned error code: 1, for model: model_ANALYSIS: starts = 100,20;_class_3.inp
#> Warning:
#> One or more analyses resulted in warnings! Examine these analyses carefully: model_ANALYSIS: starts = 100,20;_class_3
#> tidyLPA analysis using mplus:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 1 3
Created on 2019-10-10 by the reprex package (v0.3.0)
Can you advise re: passing along additional arguments (such as starts = 100, 20
)?
Have you tried taking the argument name out of quotation marks?
I am guessing the issue can be traced to here. No ...
are passed to mplusModeler
, although the documentation says it does.
@Gootjes thank you, fixed now!
Sorry for not checking my own solution properly before suggesting it.
There is another line that triggers an error. The ...
are stored in the Args
variable, and this variable is passed to mplusObject
before being passed on to mplusModeler
later on. However, that function does not accept the same arguments as mplusModeler
, leading to Mplus_command = "X:/Software/MPlus8/mplus.exe"
being rejected:
> tidyLPA::estimate_profiles(data = d, package = "mplus", n_profiles = 3, models = 1, Mplus_command = "X:/Software/MPlus8/mplus.exe")
Error in (function (TITLE = NULL, DATA = NULL, VARIABLE = NULL, DEFINE = NULL, :
unused argument (Mplus_command = "X:/Software/MPlus8/mplus.exe")
I do not know of an easy fix, except explicitly filtering valid arguments from ...
, which should do the job.
OK, it will be a while before I get to it. If you want to make a pull request to fix it, that would be much appreciated!
@Gootjes thank you for working on this issue. I tried to pass the argument "ANALYSIS: starts = 100,20;"
but seem to be running into problems. I installed the latest development version (therefore with #133 merged).
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris_sample <- iris[c(1:4, 51:54, 101:104), ] # to make example run more quickly
# No arguments passed, works fine
iris_sample %>%
subset(select = c("Sepal.Length", "Sepal.Width",
"Petal.Length")) %>%
estimate_profiles(3,
package = "mplus")
#> tidyLPA analysis using mplus:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 1 3 62.15 68.94 1.00 1.00 1.00 0.17 0.50 0.03
# With an argument passed on to mplusModeler, returns a warning
iris_sample %>%
subset(select = c("Sepal.Length", "Sepal.Width",
"Petal.Length")) %>%
estimate_profiles(3,
package = "mplus",
"ANALYSIS: starts = 100,20;")
#> The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
#> Warning in Ops.factor(mn, 2): '%%' not meaningful for factors
#> Warning in Ops.factor(mn, 2): '/' not meaningful for factors
#> Warning in runModels(target = modelout, Mplus_command = Mplus_command, logFile = NULL): Mplus returned error code: 1, for model: model_ANALYSIS: starts = 100,20;_class_3.inp
#> Warning:
#> One or more analyses resulted in warnings! Examine these analyses carefully: model_ANALYSIS: starts = 100,20;_class_3
#> tidyLPA analysis using mplus:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 1 3
Created on 2019-11-24 by the reprex package (v0.3.0)
Thank you for your work on this!
You are passing an unnamed string as an argument. The formal argument of mplusModeler()
is called ANALYSIS
!
Sorry, I'm slow *in general and just waking up (and am not seeing this documented in mplusAutomation::mplusModeler()
or mplusAutomation::prepareMplusdata()
; just focusing on this last line how could I pass an additional argument such as "ANALYSIS: starts = 100,20;"
along to mplusModeler()
?
The argument is called ANALYSIS
, so you have to pass
ANALYSIS = "starts = 100, 20"
Instead of
"ANALYSIS: starts = 100,20;"
What am I missing/doing incorrectly here?
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris_sample <- iris[c(1:4, 51:54, 101:104), ] # to make example run more quickly
# With an argument passed on to mplusModeler, returns a warning
iris_sample %>%
subset(select = c("Sepal.Length", "Sepal.Width",
"Petal.Length")) %>%
estimate_profiles(3,
package = "mplus",
ANALYSIS = "starts = 100,20;")
#> Error in prepareMplusData(df = data[i, , drop = FALSE], keepCols = object$usevariables, : unused argument (ANALYSIS = "starts = 100,20;")
Created on 2019-11-25 by the reprex package (v0.3.0)
That syntax is correct, and if it's not working, then we have a bug! Does this fail on both the CRAN and development version?
ah hah, yes. I know @Gootjes worked on a fix for this - wondering if it's specific to this argument? That fix (#133) was after we sent the last version to CRAN (2019-08-19), and so wouldn't reflect those changes... but the CRAN version (I installed via install.packages()
and restarted) seems to fail too:
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris_sample <- iris[c(1:4, 51:54, 101:104), ] # to make example run more quickly
# With an argument passed on to mplusModeler, returns a warning
iris_sample %>%
subset(select = c("Sepal.Length", "Sepal.Width",
"Petal.Length")) %>%
estimate_profiles(3,
package = "mplus",
ANALYSIS = "starts = 100, 20;")
#> Warning in system2(Mplus_command, args = c(shQuote(inputSplit$filename)), :
#> error in running command
#> Warning in runModels(target = modelout, Mplus_command = Mplus_command, logFile = NULL): Mplus returned error code: 127, for model: model_1_class_3.inp
#> Error in getOutFileList(target, recursive, filefilter): Specified target does not exist.
#> Target: model_1_class_3.out
Created on 2019-11-25 by the reprex package (v0.3.0)
Let's discuss tomorrow! Should be an easy fix.
Why I've been working on this - I want to demonstrate how to do this in our tidyLPA paper and also to demonstrate how to pass additional args to mplusModeler()
or mclust()
. Maybe cold comfort, but it seems to work for mclust()
, as this demonstrates in the context of the prior
argument:
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(mclust)
#> Package 'mclust' version 5.4.5
#> Type 'citation("mclust")' for citing this R package in publications.
# Passing arguments to mclust
iris[, -5] %>% estimate_profiles(2:4, model = 6, prior = priorControl())
#> Warning: You are using a deprecated argument when calling 'estimate_profiles.default'. Check the documentation. Deprecated arguments are:
#> model: Instead, use the argument 'models', or 'variances' and 'covariances'.
#> The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
#> tidyLPA analysis using mclust:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 6 2 505.20 592.51 1.00 1.00 1.00 0.33 0.67 0.01
#> 6 3 473.40 605.87 0.96 0.97 1.00 0.32 0.35 0.01
#> 6 4 487.30 664.93 0.96 0.98 0.99 0.04 0.36 0.66
summary(mclustBIC(iris[,-5], modelNames = "VVV"))
#> Best BIC values:
#> VVV,2 VVV,3 VVV,4
#> BIC -574.0178 -580.839630 -630.59996
#> BIC diff 0.0000 -6.821798 -56.58213
iris[, -5] %>% estimate_profiles(2:4, model = 6, prior = priorControl(shrinkage = 0))
#> Warning: You are using a deprecated argument when calling 'estimate_profiles.default'. Check the documentation. Deprecated arguments are:
#> model: Instead, use the argument 'models', or 'variances' and 'covariances'.
#> The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
#> tidyLPA analysis using mclust:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 6 2 504.76 592.07 1.00 1.00 1.00 0.33 0.67 0.01
#> 6 3 472.93 605.40 0.96 0.97 1.00 0.32 0.35 0.01
#> 6 4 486.66 664.29 0.96 0.98 0.99 0.04 0.36 0.73
summary(mclustBIC(iris[,-5], modelNames = "VVV", prior = priorControl(shrinkage = 0)))
#> Best BIC values:
#> VVV,2 VVV,3 VVV,4
#> BIC -592.0713 -605.39648 -664.28888
#> BIC diff 0.0000 -13.32516 -72.21757
Created on 2019-11-25 by the reprex package (v0.3.0)
Hopefully fixed with #134.
When I wrote the previous fix, I assumed wrongly that all arguments from ...
could be passed to mplusModeler
. Removing/excluding arguments from ...
that have been used in mplusObject
(such as the ANALYSIS
statement) when calling mplusModeler
should fix it, see pull request #134.
Maybe you can check that out to see if it fixes your issue?
The second issue you reported (the one with the CRAN version) originates in a different function call, which is the exact error I get when I misspecified the Mplus_command
argument that is used down the line in mplusModeler
, as my mplus installation is in an odd directory. Is that the case for you as well? If not, we have another bug, and my original fix does not fix your issue. Let me know!
Thanks Gootjes, I figured as much... appreciate your work on this, I'm gonna check the code after december 5th!
@Gootjes thanks for all of your work on this. I merged the PR and ran again - that same issue is coming up:
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
library(dplyr, warn.conflicts = FALSE)
pisaUSA15[1:100, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
single_imputation() %>%
estimate_profiles(3, package = "MplusAutomation",
ANALYSIS = "starts = 100, 20;")
#> Error in prepareMplusData(df = data[i, , drop = FALSE], keepCols = object$usevariables, : unused argument (ANALYSIS = "starts = 100, 20;")
Created on 2019-11-30 by the reprex package (v0.3.0)
Using the current master branch, I can no longer reproduce the issue. Perhaps your code is accidentally using an older version? These version checksums should be the same as yours:
# This displays your currently installed version (installed from github):
remotes:::local_sha("tidyLPA")
#> [1] "f3e47446b118dc4581123f262996b30c9df0d07c"
# This displays the latest version that is on github.
remotes:::remote_sha(remotes:::github_remote("data-edu/tidyLPA"))
#> [1] "f3e47446b118dc4581123f262996b30c9df0d07c"
Created on 2019-12-01 by the reprex package (v0.3.0)
library(tidyLPA)
#> tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is installed; you can use package = 'MplusAutomation' when calling estimate_profiles().
library(dplyr, warn.conflicts = FALSE)
pisaUSA15[1:100, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
single_imputation() %>%
estimate_profiles(3, package = "MplusAutomation",
ANALYSIS = "starts = 100, 20;")
#> tidyLPA analysis using mplus:
#>
#> Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
#> 1 3 626.07 662.54 0.81 0.84 0.95 0.03 0.67 0.00
Created on 2019-12-01 by the reprex package (v0.3.0)
@jrosen48 your code works for me, too. I'm developing unit tests to check that the additional arguments are also being correctly evaluated.
- We can make some changes to how numbers are rounded, as they're presently quite different between the two packages.
Actually, I think every print function correctly rounds the numbers. The raw data themselves are (of course) not rounded. Your vignette for comparing Mclust and Mplus shows different decimals, because you're using as.data.frame()
on the tibble
returned by get_fit()
. This triggers the print.data.frame()
function instead of print.tibble()
.
I recommend that, for this vignette, you print using knitr::kable()
or even DT::datatable()
.
PS: I'm closing this thread because it is very lengthy. Let's reopen new, more specific issues for benchmarking, with one issue per thread!
I've tried to show the results of benchmarking mclust and MPlus here:
https://data-edu.github.io/tidyLPA/articles/benchmarking-mclust-and-mplus.html
We can make some changes to how numbers are rounded, as they're presently quite different between the two packages.
Perhaps more importantly, even with a relatively simple data set, as models become more complex, it becomes more likely that they do not reach the same exact solution. I wonder if one way we can head this off is by demonstrating how the estimation can be controlled, i.e., by increasing the number iterations or changing the convergence criterion. Presently, we state that additional arguments can be passed to
estimate_profiles()
as follows: `Can we show an example of how to add these additional arguments for both estimating functions?