SimonDedman / gbm.auto

Machine-learning Boosted Regression Tree software suite for species distribution modelling in R
https://doi.org/10.1371/journal.pone.0188955
Other
18 stars 6 forks source link

gbm.factorplot: Pdp plot aesthetics help #81

Closed nffarabaugh closed 1 year ago

nffarabaugh commented 1 year ago

In having to run many models with many Pdp plots I was wondering if there was a way to alter some of the aesthetics in the code for ease of use! First and foremost I would like the ability to alter axis labels and titles so the short hand I use while coding doesn't show up in my final plot meant for presentation or publication. Secondly I would like to be able to re-order the x axis for categorical variables so they are either 1) in order from high to low or 2) in some other logical order I specify such as factor level , rather than alphabetical.

So for instance here is the pdp plot for some code as it is generated now: image

I would like to be able to alter the names of both the x axis title and the labels, so that my sites have proper names and are readable, and two I would like the sites to appear on the plot from high to low! Thanks very much!

nffarabaugh commented 1 year ago

being able to call the plots and objects and thus able to make changes to it myself using ggplot commands or similar would be ideal! thus anyone with any issues that differ slightly could make bespoke changes to their plots. This might avoid issues such as the one in my plot above where several of the axis labels are missing as they are too long to fit in the plot area.

SimonDedman commented 1 year ago

L831 block loops through variables and does gaus_bestline plots and then saves the same underlying data as gaus_bestline csvs. csv data column, y, isn't the same as the plot. The plot Y axis labels are normalised around 0 it seems. Somehow? Y range above is > 0.4, maybe 0.45 or so. Difference in the csv is 0.38. how does gbm.plot work? "common.scale Logical. If TRUE, a common scale is used on the y axis" default TRUE, changed to FALSE in gbm.auto already.

"Note that fitted functions are centered by subtracting their mean." for the csv, this leads to min -0.19 max 0.19. I.e. a bit less than for the plots. Difference between the two is the plots are created from dismo::gbm.plot but the csvs are created from gbm::plot.gbm. gbm.plot CALLS plot.gbm for response.matrix: responses[[j]] <- response.matrix[, 2] - mean(response.matrix[, 2]) which is the same as I just did in excel and got a different result. NOT TRUE: my plot looks different to Nff's and my plot & csv therefore align so that's all groovy therefore I can use the mean-centring approach to take the csv and make a lovely ggplot.

SimonDedman commented 1 year ago

L853 in gbm.auto: # plotgrid[,2] <- plotgrid[,2] - mean(plotgrid[,2]) have evidently done all of this before... ugh. Added that line into gbm.auto so the centred bit is exported with the csv.

SimonDedman commented 1 year ago

see this, please make edits as you see fit.

SimonDedman commented 1 year ago

https://wilkelab.org/ungeviz/reference/geom_hpline.html

SimonDedman commented 1 year ago

gbm.factorplot updated and added into gbm.auto at L844 & L918