Closed crtahlin closed 10 years ago
I am prototyping the graph in ggplot. Measurements are used as.factor, missing values are ommited: Warning messages: 1: Removed 1 rows containing missing values (stat_summary). 2: Removed 1 rows containing missing values (geom_path).
The plot looks like this (showing only 4 symptoms):
The red squares are medians at a certain measurement occasion. The lines are connecting values for a certain individual. It is not very clear, how the values are moving - too much clutter. I think the most revealing are the median values, not the lines themselves. Any opinions? Should I implement it as is - I guess it depends on the data, how cluttered the results are. Some other data might be quite revealing...
I think that we should do the following
0 - add a tab in which we display this graph and call it: Distribution of the variables over time - or think a better name. or simply add ( - by measurement occasion in all the tabs where the analyses/synthesis is done for each time occasion separately - would do this)
1 - as this plot might or might not be informative, depending on the number of subjects/times, we give the user two choices, through a drop down menu:
1b - select a random subset (n=X) of the subjects to display 1c- do many graphs, in which at most X patients are displayed my code for 1b/c is reported below (for 1 symtp. , no shiny features included )
1d - do a simple graph that includes a boxplot for each measurement time- in the case of the sympt. it looks ok
############## code that should work, written for 1 symtpom - except for the call to the call
input$file1$datapath = "C:/Users/lara/Dropbox/medplot/ForSymptoms/DataEM.txt"
fix the path
input=vector("list")
input$file1=vector("list")
input$file1$datapath = "C:/Users/lara/Dropbox/medplot/ForSymptoms/DataEM.txt"
input$dateVar="Date"
input$patientIDVar="PersonID"
input$measurementVar="Measurement"
input$groupingVar="Sex"
data <- read.csv(input$file1$datapath, header=TRUE, sep="\t")
input$selectedSymptoms=names(data)[-c(1:8)]
# transform date information into R compliant dates
data["Date"] <- as.Date(data[,"Date"], "%d.%m.%Y")
dataFiltered=data
input$selectedSymptoms
j=10
matplot(dataFiltered[,input$measurementVar], dataFiltered[,j], lty=2, type="n", xlab="Measurement", ylab=names(dataFiltered)[j])
for(my.id in unique(dataFiltered[,input$patientIDVar])){
temp.data=dataFiltered[is.element(dataFiltered[,input$patientIDVar], my.id) ,]
#add a bit of noise on the x-axis
j.data=jitter(temp.data[,input$measurementVar])
matlines(j.data, temp.data[,j], lty=2, col=1:10)
matpoints(j.data, temp.data[,j], lty=2, pch=1)
}
j=10
#random sample a subset of patients, say 20
num.displayed=20
which.t0=which(dataFiltered[, input$measurementVar]==min(dataFiltered[, input$measurementVar], na.rm=T))
num.samples.t0=length(which.t0)
which.use=sample(num.samples.t0, num.displayed)
matplot(dataFiltered[,input$measurementVar], dataFiltered[,j], lty=k, type="n", xlab="Measurement", ylab=names(dataFiltered)[j])
k=0
for(my.id in unique(dataFiltered[,input$patientIDVar])[which.use]){
k=k+1
k=ifelse(k<10, k+1, 1) #use different colors and dashed lines - set back to k=1 to get all black and solid lines
temp.data=dataFiltered[is.element(dataFiltered[,input$patientIDVar], my.id) ,]
matlines(temp.data[,input$measurementVar], temp.data[,j], lty=k, col=k)
matpoints(temp.data[,input$measurementVar], temp.data[,j], lty=k, pch=1)
}
############# end of random sample example
############### select the maximum number of patients per graph, patients are grouped based on the value of the variables at t=0
j=10
which.t0=which(dataFiltered[, input$measurementVar]==min(dataFiltered[, input$measurementVar], na.rm=T))
num.samples.t0=length(which.t0)
num.patients.per.graph=10
num.graphs=ceiling(num.samples.t0/num.patients.per.graph)
my.breaks=round(seq(1, num.samples.t0+1, length.out=num.graphs))
#par(mfrow=c(ceiling(num.graphs/2) , 2)) #reset back, does not work in R figure margins too large, will work in the browser if the figure is set big enough
for(i in 1:(num.graphs-1)){
# which(rank(dataFiltered[which.t0,j])<=num.samples.t0/4)
which.names.use=dataFiltered[which.t0,input$patientIDVar][rank(dataFiltered[which.t0,j], ties="first")>=my.breaks[i] &
rank(dataFiltered[which.t0,j], ties="first")<my.breaks[i+1] ]
matplot(dataFiltered[,input$measurementVar], dataFiltered[,j], lty=2, type="n", xlab="Measurement", ylab=names(dataFiltered)[j])
k=0
for(my.id in which.names.use){
k=k+1
k=ifelse(k<10, k+1, 1)
temp.data=dataFiltered[is.element(dataFiltered[,input$patientIDVar], my.id) ,]
j.data=jitter(temp.data[,input$measurementVar])
j.data.y=jitter(temp.data[,j])
#matlines(j.data, temp.data[,j], lty=k, col=k)
#matpoints(j.data, temp.data[,j], pch=1)
matlines(j.data, j.data.y, lty=k, col=k)
matpoints(j.data, j.data.y, pch=1)
}#end for my.id
}#end for i
Code for ggplot (random sample included):
# load libraries
library(ggplot2)
# load data - Crt
dataTest <- read.csv("C:/Users/Crt Ahlin/Documents/Dropbox/medplot_shared_Crt/ForSymptoms/DataEM.txt",
header=TRUE, sep="\t")
# load data - Lara
dataTest <- read.csv("C:/Users/lara/Dropbox/medplot/ForSymptoms/DataEM.txt",
header=TRUE, sep="\t")
# draw sample
sizeofSample <- 10
peopleInSample <- sample(unique(dataTest[,"PersonID"]), sizeofSample)
dataRandomSample <- dataTest[dataTest[, "PersonID"] %in% peopleInSample, ]
# prepare data
dataMelted <- melt(data=dataRandomSample, id.vars=c("Measurement", "PersonID"), measure.vars=c("Fatigue","Malaise","Headache", "Insomnia") )
# set some variables as factors
dataMelted[,"PersonID"] <- as.factor(dataMelted[,"PersonID"])
dataMelted[,"Measurement"] <- as.factor(dataMelted[,"Measurement"])
# code to draw graph
# define x, y axis, groups, coloring
p <- ggplot(data=dataMelted, aes(x=Measurement, y=value, group=PersonID, colour=PersonID)) +
# draw points, draw lines, facet by symptom, use black & white theme
geom_point() + geom_line() + facet_grid(variable~.) + theme_bw() +
# add summary statistics at each point
stat_summary(aes(group=1), geom="point", fun.y=median, shape=15, size=5, colour="red")
# plot
print(p)
Answers:
0 - add a tab in which we display this graph and call it: Distribution of the variables over time - or >think a better name. or simply add ( - by measurement occasion in all the tabs where the >analyses/synthesis is done for each time occasion separately - would do this)
Ok. Almost all tabs (except Timeline and Distribution: by grouping variable) are actually by measurement occasion. So I will add ": by measurement occasion" to all of them. And name this new one " Distribution of the variables: over time", to keep thing consistent.
1 - as this plot might or might not be informative, depending on the number of subjects/times, we >give the user two choices, through a drop down menu: 1b - select a random subset (n=X) of the subjects to display 1c- do many graphs, in which at most X patients are displayed my code for 1b/c is reported below (for 1 symtp. , no shiny features included )
Ok, I can do this relatively quickly in ggplot (as I have the concept how to do it in my head). I also can relatively quickly implement #63 and any other additional faceting in ggplot, so I am rooting for the ggplot solution.
1d - do a simple graph that includes a boxplot for each measurement time- in the case of the >sympt. it looks ok
Great. This will probably be a lot less cluttered and helpful. On the same (new) tab as the profile plot?
1 - Yes, I would make it selectable (within the tab, default: displays all, if selected, displays a subset) 2 - this type of graph by dates would not make much sense, could make sense for days since enrollment 1d- if space permits, I woul put the boxplot on the same tab
In Distribution of the variables over time we should add also a table in which we summarize the data over all measurement occasions - something as the table appearing in distributions of the variables - by measurement occasion, just putting all the data toghether
Have implemented 1, 1a, 1b in cf30eb20b781c97c1b8d06e4b8041f94718ad4d0 . Since the graph takes quite a long time to plot, I have made 1a the default choice. If needed, I can change that quickly.
1d and table yet to be implemented.
1d (boxplots) done in 023582795293e773eeb23c641ee7ed790d6bb046 . I have put them in a separate tab with the working name "?Variables: over time w boxplots?". We should perhaps rename all the "Distribution of variables: ..." tabs into "Variables: ...", to take less space?
Table remains to be implemented.
Tables under boxplots were done in #80 , closing.
Add a profile plot to the timeline. A profile plot is a longitudinal plot showing a line connecting all measurment points. The horizontal axis should show measurement occasions.
Try using ggplot, as it should have (optionaly?) faceting by symptoms.