alexanderrobitzsch / TAM

Test Analysis Modules
https://alexanderrobitzsch.github.io/TAM/
15 stars 9 forks source link

Design matrix creation takes over a week #9

Closed ck37 closed 5 years ago

ck37 commented 5 years ago

Hello,

Thanks for such an excellent open source IRT package. I am using it to conduct a many-facet PCM for hate speech scaling, but after running for 8 days it still has not completed the design matrix creation unfortunately.

Here is the current output:

....................................................
Processing Data      2019-09-15 16:57:59 
        o Check facets ( 2019-09-15 16:57:59 )
          * Changed levels of facet rater: '_10873' '_2790_', ...
[...]
 o Create design matrix A
          |**********|
          |

My dataset consists of ~36,000 comments, ~11,000 raters, and 9 items (4-5 response options each). Most comments are rated 4 times. There is a reference set that links all raters in which each comment is rating 500 - 1000 times. I have removed the reference set in the current analysis in an attempt to speed up the analysis (raters should still be linked due to overlapping comment ratings).

Here is my mfr command:

model =
  tam.mml.mfr(scale_data,
              facets = facet_mat,
              formulaA =~ item + item:step + rater,
              constraint = "items",
              control = list(maxiter = 100, fac.oldxsi = 0.4),
              pid = obs_id)

Any suggestions on how to speed up the analysis? It looks like one possible area for improvement is to add multicore support for the design matrix creation, which I'd be open to helping with.

In terms of the fitting itself, I am also running into convergence issues (identified on a smaller pilot dataset of 4,000 comments), which is why I include the fac.oldxsi argument. I am also running into the convergence issues on ConQuest.

I would be open to sharing the data if it would be helpful.

Many thanks, Chris

alexanderrobitzsch commented 5 years ago

The facets module is not built for a huge number of levels for particular facets. Internally, 9*11,000 pseudoitems are created which is quite a mess. Moreover, I suppose that treating raters as random in your example could help? Joint maximum likelihood estimation without the tedious transformation could maybe help. I started some time ago with a joint maximum likelihood implementation for many levels of a facet here

https://github.com/alexanderrobitzsch/immer/blob/master/R/immer_jml_facets.R

But I did not complete the function yet. But I think this could be a path to proceed.

ck37 commented 5 years ago

Thanks, I have re-implemented in Linacre's Facets software and it's able to run the scaling in 10 minutes, so I'm going with that in the short term.

That said, over the medium term I aspire to help get this working in TAM directly.

alexanderrobitzsch commented 5 years ago

The implementation in immer is something similar to Facets (a JML variant). I do not think that one can use the original TAM implementation because it is meant to be applied in constellations like yours.

ck37 commented 5 years ago

Gotcha, that makes sense - I will go ahead and close this issue.