Suggestions to improve shiny application (PLPviewer)

AniekMarkus commented 4 years ago

Being relatively new to the shiny application, I wrote down some things I encountered that we can perhaps use to improve the shiny application in the future (PLPviewer):

In the summary tab: 'T size' is the set of the test set only right? This is the same as the full set for external validation, but it is not clear what is reported when looking at summary results for the development dataset/internal validation. Suggestion: 'Test size' (it is not the true size of the target cohort, sometimes we use a smaller random sample).
In the model tab -> model table: It would be useful to add total covariate counts even if number of outcomes < 10. I understand we omit all covariate results for privacy reasons, but this aggregated count is no problem right? If this is also privacy sensitive, we can of course not include it. Also make clear that 'count -1' means that it is protected for privacy reasons, might not be intuitive when you look at this for the first time.
In the model tab -> model table: Instead of reporting 'count', isn't it more (or at least also) informative to report 'count' as a percentage of 'T size' (relative compared to the population size)? Or the overall mean (for binary variables this is the same)? Now we have only split for outcome and non-outcome?
In the model tab -> model table: 'Value' is very generic, perhaps call it 'Coefficient' or 'Coefficient value'?
In the model tab -> model table: the 'download model button' downloads the model with only non-zero coefficients. This is not expected as both are shown in shiny (and can be overlooked when the number of zero coefficients is small). Make it an option to download all or only non-zero coefficients? Otherwise my suggestion would be to just let it export all coefficients (easier to remove the rest later than other way around).
Another suggestion: add a tab with more information on databases, for example some characterization across databases with vertically: T size / O count / O rate (%) / Age (mean, in years) / Gender (%, male) / Medical history of X (%) / etc. and horizontally: db 1 / db 2 / etc.
In the performance tab: idea to give information for each plot how they can (or should be) interpreted? This could be similar to the descriptions included in the protocols. Not all plots are very conventional and this could improve accessibility of the presented results for a wider public.
In the performance tab -> what is the difference between the 'predicted probability' axis in the box plot and 'predicted threshold' axis in the prediction score distribution? Also: the colours are reversed here for outcome and no outcome which is confusing… Plus: perhaps nice to use consistent terms throughout shiny application: outcome vs no outcome or outcome vs non-outcome?
Not sure how this can best be solved but we are quite inconsistent in names of databases, target and outcome cohorts which can be confusing. Can we maybe automatically include the definition of target and outcome cohorts used in the shiny application? So at least it is clear what a target or outcome cohort name means?
Not so necessary but an idea: Is it possible to include a time-stamp of when each analysis was last updated? We update results quite regularly during development and it's not clear from the shiny application when this was last done.

jreps commented 4 years ago

Here are my thoughts:

Test size - we need a single column for development/validation - but we could show the complete data size rather than the test size (and somewhere in the settings have the test/train split or add another column with performance sample %)?

In the model tab -> model table - I censored all results when counts were low in the transportPlp() function so that there is no privacy issues. Shiny just shows the results people send, so the edit would be outside shiny.

model tab -> model table - I kept it like this to present information without the table getting too big but there are tables where you can add/remove columns that we can use now. If we transfer to that table type we can add the extra columns. Would be useful to add model setting information as well.

In the model tab -> model table: value was used as it is only a coefficient in glm. Could rename variableImportance but that seems like a long name and isn't quite true for glm unless we calculate the variable importances.

In the model tab -> model table: at one point we only had the non-zero entries in shiny due to speed issues and size, this is just a result of that. Can be removed.

Database tab - yeah, I suggested this the other day, we need information about the databases

In the performance tab: helper sounds like a good idea, and the colors were due to R factors being alphabetic (can be manually edited but it wasn't a priority for me at the time). Martijn mentioned using different colors that are better for black/white print as well - would be good to change.

Names - where are the inconsistencies? Do you want a helper ? with a pop up saying what the terms mean?

time-stamp - could add to the summary when we use the better table - I think the results should have a time stamp in them somewhere.

AniekMarkus commented 4 years ago

Thanks for replying to my comments.

Test size - complete data size + % that is used for testing, sounds good.

Model table - okay I understand, still something we could think about changing there I guess.

Model table - a more flexible table sounds good, I get that you're trying to avoid overload. if we can do both that would be great, otherwise I think count/data size is more informative than just count and we could consider replacing it.

Model table - you're right about that. I agree we should keep it as standardised as possible for different types of models. I assume we return something like variable importance for other types of models (for e.g. random forest)? will think about this issue, will let you know if I have ideas.

Model table - great, easy fix I think.

Databases - nice.

Performance - yes definitely get that, good to change these two in one go now.

Names - this is not due to the way we've set up the shiny application, but it occurs while importing the results. inside a shiny application we have validation databases and target cohorts with different spellings, which makes it confusing (e.g. covid-19 simple models). perhaps we can come up with an idea to avoid that. next to that, I thought we could make the target/outcome cohort definitions accessible in shiny too (e.g. the text part from ATLAS).

Time-stamp - yes they probably have that, would be helpful I think to add if possible.

Let me know if I can help out.

jreps commented 4 years ago

I've added most things (except a time stamp) to the latest development branch commit. I'm going to also add a few more improvements such as displaying the hyper-parameter search and attrition.

jreps commented 4 years ago

added date stamp - all the edits that can be done in shiny have been added.

OHDSI / PatientLevelPrediction

Suggestions to improve shiny application (PLPviewer) #163