angelovangel / tidydrc

tidy modelling of dose-response relationships with the drc package
MIT License
4 stars 2 forks source link

ED50 extrapolation is not where the curve intersects y=50 #1

Open wcwr opened 3 years ago

wcwr commented 3 years ago

Hey Angel - Thanks for the great work on this package!

This issue relates to the ED50 calculation being off from where the curve intersects at y=50. Here, I use normalized data (constrained within 0-100) and attempt to extract an ED50 closer to the ground truth.

library(tidydrc)
#Make a data frame of dose-response results (response signifies pre-normalized %Cell Death values)
#As a ground truth, we have dose of 59.375 equals response of 50.000
mydr <- data.frame("dose"=c(1900,950,475,237.5,118.75,59.375,29.68,14.84,7.42,3.71), 
                   "response"=c(98.38,96.267,93.300,91.1,53.56,50.000,44.2,24.1,15.5,9.01))

#Apply the LL.4 model to fit a log-logistic curve
mydr.LL4 <- tidydrc_model(mydr, dose, response, model=LL.4())

#Extract the EC50 value
mydr_EC50 <- as.data.frame(as.numeric(mydr.LL4[[4]][[1]][4,2]))

#Plot the curve
#ed50=TRUE plots a dashed black line according to the extrapolated EC50 
#ground truth lines are in red. y-axis=50, and x-axis=59.375

tidydrc_plot(mydr.LL4,ed50=TRUE) + 
  scale_x_log10(labels=log10) + 
  xlab("Log10[concentration]")+
  ylab("% Cell Death") +
  geom_hline(aes(yintercept=50), linetype=1, color="red") +
  geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
  geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
  geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3))))

image

We can see the dashed line is off, calling the ec50 as 67.29, when we know it is 59.375 Could this be due to extrapolating 50% of the curves maximal effect, and not considering the constriants of 0-100 imposed by the pre-normalized data?

#Look at the predicted values of the fit to see if we can get closer to the 'true' ec50 value
mydr_predvals <- mydr.LL4$pred[[1]]
plot(mydr_predvals$pred)

#From the predicted values, get the dose where the response is closest to 50
new_EC50 <- mydr_predvals[which.min(abs(50-mydr_predvals$pred)),1] #note which.min alone returns the *position*, so doing it like this returns the actual value

#Re-make the plot but add new_EC50 (green) and see if it gets any closer
tidydrc_plot(mydr.LL4,ed50=TRUE) + 
  scale_x_log10(labels=log10) + 
  xlab("Log10[concentration]")+
  ylab("% Cell Death") +
  geom_hline(aes(yintercept=50), linetype=1, color="red") +
  geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
  geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
  geom_vline(aes(xintercept=new_EC50), linetype=5, color="darkgreen") +
  geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3)))) +
  geom_text(aes(x=10.9,y=94, label=paste0("new_EC50: ", round(new_EC50,3))))

image

The new EC50 gets much closer. Thoeretically, some datasets may have a response point closer to 50 than exists in the predicted values

#Merge the original response with the predicted response and take whichever value of the combined results is closest to 50:
merged <- data.frame("dose"=abind(mydr_predvals$dose,mydr$dose), "response"=abind(mydr_predvals$pred, mydr$response))
new2_EC50 <- merged[which.min(abs(50-merged$response)),1]

#Re-make the plot and add the new2_EC50 value
tidydrc_plot(mydr.LL4,ed50=TRUE) + 
  scale_x_log10(labels=log10) + 
  xlab("Log10[concentration]")+
  ylab("% Cell Death") +
  geom_hline(aes(yintercept=50), linetype=1, color="red") +
  geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
  geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
  geom_vline(aes(xintercept=new_EC50), linetype=5, color="darkgreen") +
  geom_vline(aes(xintercept=new2_EC50), linetype=5, color="cyan", size=1.2)+
  geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3)))) +
  geom_text(aes(x=10.9,y=94, label=paste0("new_EC50: ", round(new_EC50,3)))) +
  geom_text(aes(x=10.6,y=88, label=paste0("new2_EC50: ", round(new2_EC50,3))))

image

Now we've achieved the ground truth. I realize that the likelihood of a single point in the observed data being closer to 50 than the best predicted value is pretty low, but merging both datasets and looking for the closest dose where response=50 seems to fix this. Either way, the two values aren't super different.

Best, Charlie

BradyAJohnston commented 3 years ago

Looking at the fit-curve on the plot, the curve itself doesn't flatten until above 100, and the lower limit is above 0.

While the data might be normalised between 0 and 100, the fitted model isn't. The EC50 you are getting (67.3) is the correct one from the model, working from the data provided. If you wanted a model that was more normalised to 50% cell death being the 50% response in the model, you will need more data points to the right-hand side of the plot that flattens out the response at 100 - and more to the left that flattens it better around 0.

See example below that uses your provided data (with some additional points below and above) that fits the model with an EC50 that better represents the "50 % Response".

library(tidydrc)
#Make a data frame of dose-response results (response signifies pre-normalized %Cell Death values)
#As a ground truth, we have dose of 59.375 equals response of 50.000
mydr <- data.frame("dose"=c(7600, 3800, 1900,950,475,237.5,118.75,59.375,29.68,14.84,7.42,3.71, 1.855, 0.96, 0.45, 0.23), 
                   "response"=c(99.50, 99.01, 98.38,96.267,93.300,91.1,53.56,50.000,44.2,24.1,15.5,9.01, 4.5, 2, 1, 0.5))

#Apply the LL.4 model to fit a log-logistic curve
mydr.LL4 <- tidydrc_model(mydr, dose, response, model=LL.4())

#Extract the EC50 value
mydr_EC50 <- as.data.frame(as.numeric(mydr.LL4[[4]][[1]][4,2]))

#Plot the curve
#ed50=TRUE plots a dashed black line according to the extrapolated EC50 
#ground truth lines are in red. y-axis=50, and x-axis=59.375

tidydrc_plot(mydr.LL4,ed50=TRUE) + 
  scale_x_log10(labels=log10) + 
  xlab("Log10[concentration]")+
  ylab("% Cell Death") +
  geom_hline(aes(yintercept=50), linetype=1, color="red") +
  geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
  geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
  geom_text(aes(x=10,y=100,label=paste0("Fitted_EC50: ",round(mydr_EC50,3))))

image

The problem is not in the package or the model - but the slightly-incomplete data that the model was working from.

wcwr commented 3 years ago

@BradyAJohnston

Thanks for this reply!

Yes, I also found that when having a dataset whose lowest value is at 0 and whose highest is at 100 does capture a true EC50. The problem being that in real-world experiments this is not always going to be the case. In high-throughput settings experiments are usually limited by the total amount of doses that are achievable. The design is definitely to attempt the lowest concentration to approach zero (%cell death, for example) and the highest 100, but that certainly isn't always the case.

That being said, if the proper controls were used for normalization it would not matter if the highest point was, say, 60%. We can consider a common example in dose-response screening where the highest doses of a drug still don't kill the cells:

library(tidydrc)
#Make a data frame of dose-response results (response signifies pre-normalized %Cell Death values)
#As a ground truth, we have dose of 59.375 equals response of 50.000
mydr <- data.frame("dose"=c(76000,38000,19000,9500,4750,2375,1187.5,593,296.7,148.4,74.21,37.10,18.55,9.277,4.63), 
                   "response"=c(60,60,60,55,50,45,28,10,5,3.5,1,0.01,0.001,0,0))

#Apply the LL.4 model to fit a log-logistic curve
mydr.LL4 <- tidydrc_model(mydr, dose, response, model=LL.4())

#Extract the EC50 value
mydr_EC50 <- as.data.frame(as.numeric(mydr.LL4[[4]][[1]][4,2]))

#Plot the curve
#ed50=TRUE plots a dashed black line according to the extrapolated EC50 
#ground truth lines are in red. y-axis=50, and x-axis=59.375

tidydrc_plot(mydr.LL4,ed50=TRUE) + 
  scale_x_log10(labels=log10) + 
  xlab("Log10[concentration]")+
  ylab("% Cell Death") +
  geom_hline(aes(yintercept=50), linetype=1, color="red") +
  geom_vline(aes(xintercept=4750), linetype=1, color="red")+
  geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
  geom_text(aes(x=10,y=100,label=paste0("Fitted_EC50: ",round(mydr_EC50,3))))+
  geom_text(aes(x=13,y=96,label="true_EC50: 4750"))

image

The EC50 is the concentration that kills 50% of the cells, not the concentration that is 50% of that specific drug's maximum effect. If no controls are used for normalization, there is no choice but to take 50% of the max. Sometimes the minimum and maximum responses are even (erroneously) set as the 0 and 100% constraints.

Our example here represents something we see fairly frequently, and the true EC50 is not captured using this default method. It's not an issue of dataset completeness, but I guess probably one of 'take y=50? or take 1/2 max effect?' and I'm thinknig the answer would depend on whether the input is normalized or not.

In most cases I don't think there would be much of a difference, but in cases like these I'll just continue to implement the little 'add-on' mentioned previously to make sure the 'most' accurate EC50 is captured.

Thanks again, Charlie