cnchapman / choicetools

Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys
Apache License 2.0
27 stars 14 forks source link

Example md.define formatted dataset #5

Open Tixeerif opened 3 weeks ago

Tixeerif commented 3 weeks ago

Hi Chris, I'm a long time Sawtooth user but due to having changed jobs recently, this is currently not available to me and I'm really keen to try your choicetools package for MaxDiff analysis.

I understand we need a md.define dataset object to run the analysis and that there are functions to convert .cho or qualtrics exports into this format, but I have neither available to me as we use different scripting software!

Is is possible to use a pre-formatted dataset as the md.define object directly from e.g., SPSS or .csv? If so, can you provide an example of how that data should be formatted to work with the md.hb function?

Apologies if there is an example somewhere and I haven't found it. Despite a few years in thie field I'm pretty new to github / R and am finding documentation tricky to follow. I've found the example qualtrics-pizza-maxdiff.csv file but not an example of a resulting md.define formatted object.

Many thanks, Scott

cnchapman commented 3 weeks ago

Hi Scott --

It depends on the data and what you're trying to do. In my package, md.define can hold several different things, such as raw choice survey responses (e.g., from Qualtrics or Sawtooth) along with estimated utilities. Do you have raw survey observations or estimated utilities?

The formats themselves are demonstrated in the various data files and documented in the Quant UX Book (https://quantuxbook.com) . However IF you are looking mostly to work with and plot already-estimated utilities, then you can check out some simpler examples lately at my blog: quantuxblog.com

HTH!

-- Chris

On Tue, Nov 5, 2024 at 4:09 AM Tixeerif @.***> wrote:

Hi Chris, I'm a long time Sawtooth user but due to having changed jobs recently, this is currently not available to me and I'm really keen to try your choicetools package for MaxDiff analysis.

I understand we need a md.define dataset object to run the analysis and that there are functions to convert .cho or qualtrics exports into this format, but I have neither available to me as we use different scripting software!

Is is possible to use a pre-formatted dataset as the md.define object directly from e.g., SPSS or .csv? If so, can you provide an example of how that data should be formatted to work with the md.hb function?

Apologies if there is an example somewhere and I haven't found it. Despite a few years in thie field I'm pretty new to github / R and am finding documentation tricky to follow. I've found the example qualtrics-pizza-maxdiff.csv file but not an example of a resulting md.define formatted object.

Many thanks, Scott

— Reply to this email directly, view it on GitHub https://github.com/cnchapman/choicetools/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG7RQXQUXLTI6N3P2WV7L3Z7CYOFAVCNFSM6AAAAABRGMSPT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTKMRYGE2TIMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Tixeerif commented 3 weeks ago

Thanks for quick reply Chris! I have raw survey observations, so planning to use choicetools to estimate the utilities and eventually rescaled preference shares.

I'm currently working my way through one of your other books (R for Marketing Research and Analytics) but will see if I can pick up a copy of the UX book too as that has a specific section on MaxDiff and will have a look at the examples on the accompany webpage - thanks for the link.

My datsets is currently formatted as one row per respondent with two variables per task, flagging the best and worst selections for each task - so identical to how I would normally import them into Sawtooth. It's already very cleanly formatted so I was hoping to use these without too much additional work!

Do you have an example .cho formatted MaxDiff dataset? I tried playing with the pizza qualtrics dataset and got that to work nicely with the parse function, but the original qualtrics format is probably hard to replicate with my dataset. I might be able to coerce it into a .cho format, which I could then use with the choicetools package - I just don't know how that .cho should look!

Thanks, Scott

cnchapman commented 3 weeks ago

Hi there --

If you in fact only have the best and worst, then there is not enough data to estimate a standard MaxDiff using conditional logit and/or hierarchical Bayes. Those methods also need to know which items were shown but were not chosen as either best or worse.

Here's why. Suppose I know that "A" won and "E" lost on a specific trial. That gives the info: A > E. I don't know anything else. But if I know that B, C, and D were shown on the same task, then I know much more: A>B, A>C, A>D, A>E, B>E, C>E, D>E. That additional information is needed to estimate a model with the kind of sparse data that MaxDiff surveys typically provide.

As for data format, the Pizza CSV file you reference demonstrates the format. The "design order" columns at the far right hand side give the list of which items were shown on each task. Then the other columns show, in extended format, which item won (coded as "2") and which item lost (coded as "1"). The blanks mean either it wasn't shown, or -- if it appears in the design order column -- then it was shown but not chosen as best or worst. You could build your data into that format IF you know why items were shown alongside the winners/losers.

If you don't have the data to know what else was shown on each task, then my suggestion is to do an approximate "counts" analysis where you report the proportion of Best minus proportion of Worse. It will not be an exact counts analysis because, again, you wouldn't actually know whether the items were all shown the exact same number of times. (Or maybe you know that from the survey design?) You can find an example of counts analysis here: https://quantuxblog.com/easy-maxdiff-in-r

Cheers,

-- Chris

On Wed, Nov 6, 2024 at 4:38 AM Tixeerif @.***> wrote:

Thanks for quick reply Chris! I have raw survey observations, so planning to use choicetools to estimate the utilities and eventually rescaled preference shares.

I'm currently working my way through one of your other books (R for Marketing Research and Analytics) but will see if I can pick up a copy of the UX book too as that has a specific section on MaxDiff and will have a look at the examples on the accompany webpage - thanks for the link.

My datsets is currently formatted as one row per respondent with two variables per task, flagging the best and worst selections for each task - so identical to how I would normally import them into Sawtooth. It's already very cleanly formatted so I was hoping to use these without too much additional work!

Do you have an example .cho formatted MaxDiff dataset? I tried playing with the pizza qualtrics dataset and got that to work nicely with the parse function, but the original qualtrics format is probably hard to replicate with my dataset. I might be able to coerce it into a .cho format, which I could then use with the choicetools package - I just don't know how that .cho should look!

Thanks, Scott

— Reply to this email directly, view it on GitHub https://github.com/cnchapman/choicetools/issues/5#issuecomment-2459646563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG7RQQGMMXHG72DFYCZ4IDZ7IEVZAVCNFSM6AAAAABRGMSPT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZGY2DMNJWGM . You are receiving this because you commented.Message ID: @.***>

Tixeerif commented 3 weeks ago

Hi! Apologies, I wasn't clear! I do also have the full design information, so can run the full logit / HB model.

I've had some success using the flipMaxDiff package that interfaces with Stan but I found it very slow when running enough iterations to converge.

If I only need the design order and selections made per above then I might be able to wrangle my data to fit. I had thought that the column headers and additional meta data (eg on the first two rows) were also needed. I'll give that a go!

Thanks for the link to your blog earlier too. Great lunchtime reading 👍🏼

cnchapman commented 3 weeks ago

If you use the functions as written then yes, you would need to provide similar column headers and the first three rows. But you can just use the pizza CSV rows and do minor replacement/augmentation (such as changing the item labels and adding/removing columns to match).

OR search for the CHO file documentation from Sawtooth and use that format instead. The pizza CSV format is probably easier though IMO.

Best!

-- Chris

On Wed, Nov 6, 2024, 9:49 AM Tixeerif @.***> wrote:

Hi! Apologies, I wasn't clear! I do also have the full design information, so can run the full logit / HB model.

I've had some success using the flipMaxDiff package that interfaces with Stan but I found it very slow when running enough iterations to converge.

If I only need the design order and selections made per above then I might be able to wrangle my data to fit. I had thought that the column headers and additional meta data (eg on the first two rows) were also needed. I'll give that a go!

Thanks for the link to your blog earlier too. Great lunchtime reading 👍🏼

— Reply to this email directly, view it on GitHub https://github.com/cnchapman/choicetools/issues/5#issuecomment-2460417002, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG7RQUDCJQ23XEITWUUQCLZ7JJENAVCNFSM6AAAAABRGMSPT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQGQYTOMBQGI . You are receiving this because you commented.Message ID: @.***>

Tixeerif commented 3 weeks ago

Super, thanks Chris. Busy few days coming up but I'll give it another go. Thanks for your help, it's greatly appreciated. Scott

Tixeerif commented 2 days ago

Just a quick one, Chris, to let you know that I got everything working. It runs really quickly on my machine and the results correlate really well with some commerial software I tested against. Thanks so much for your help!