bmkramer / 101innovations-survey-data

Stringing beads - identifying research workflows from tool usage data (clustering)
11 stars 5 forks source link

Statistics - Difference in tool usage across user groups #7

Open bmkramer opened 8 years ago

bmkramer commented 8 years ago

Compare tool usage (per research activity) for different user groups (discipline, research role, career length, country)

relation with https://github.com/bmkramer/101innovations-survey-data/issues/3

RMHogervorst commented 8 years ago

Idea 1 : clustering of answers.
https://cran.r-project.org/web/views/Cluster.html Probably need to recode into 1 and 0 for every category.

Name          MENDREAD    READCUBE
person 1     1             0                    etc

Can make several distance measures and cluster on them

RMHogervorst commented 8 years ago

idea 2: some form of classification or random forests

jcolomb commented 5 years ago

If you deal with binary data (0 or 1) you will need special tools I think. (random forest do not deal with binary data I think)

For clustering, you often need to set the number of cluster you want a priori.

Maybe the easiest is to set question you want to have answered (difference between humanities and sciences, for example) and then use a PCA and look for a difference on PC1. Careful: you need to set your questions before you touch the data, or you will end up p-hacking your data!