Closed AniekMarkus closed 4 years ago
Here are my thoughts:
Test size - we need a single column for development/validation - but we could show the complete data size rather than the test size (and somewhere in the settings have the test/train split or add another column with performance sample %)?
In the model tab -> model table - I censored all results when counts were low in the transportPlp() function so that there is no privacy issues. Shiny just shows the results people send, so the edit would be outside shiny.
model tab -> model table - I kept it like this to present information without the table getting too big but there are tables where you can add/remove columns that we can use now. If we transfer to that table type we can add the extra columns. Would be useful to add model setting information as well.
In the model tab -> model table: value was used as it is only a coefficient in glm. Could rename variableImportance but that seems like a long name and isn't quite true for glm unless we calculate the variable importances.
In the model tab -> model table: at one point we only had the non-zero entries in shiny due to speed issues and size, this is just a result of that. Can be removed.
Database tab - yeah, I suggested this the other day, we need information about the databases
In the performance tab: helper sounds like a good idea, and the colors were due to R factors being alphabetic (can be manually edited but it wasn't a priority for me at the time). Martijn mentioned using different colors that are better for black/white print as well - would be good to change.
Names - where are the inconsistencies? Do you want a helper ? with a pop up saying what the terms mean?
time-stamp - could add to the summary when we use the better table - I think the results should have a time stamp in them somewhere.
Thanks for replying to my comments.
Test size - complete data size + % that is used for testing, sounds good.
Model table - okay I understand, still something we could think about changing there I guess.
Model table - a more flexible table sounds good, I get that you're trying to avoid overload. if we can do both that would be great, otherwise I think count/data size is more informative than just count and we could consider replacing it.
Model table - you're right about that. I agree we should keep it as standardised as possible for different types of models. I assume we return something like variable importance for other types of models (for e.g. random forest)? will think about this issue, will let you know if I have ideas.
Model table - great, easy fix I think.
Databases - nice.
Performance - yes definitely get that, good to change these two in one go now.
Names - this is not due to the way we've set up the shiny application, but it occurs while importing the results. inside a shiny application we have validation databases and target cohorts with different spellings, which makes it confusing (e.g. covid-19 simple models). perhaps we can come up with an idea to avoid that. next to that, I thought we could make the target/outcome cohort definitions accessible in shiny too (e.g. the text part from ATLAS).
Time-stamp - yes they probably have that, would be helpful I think to add if possible.
Let me know if I can help out.
I've added most things (except a time stamp) to the latest development branch commit. I'm going to also add a few more improvements such as displaying the hyper-parameter search and attrition.
added date stamp - all the edits that can be done in shiny have been added.
Being relatively new to the shiny application, I wrote down some things I encountered that we can perhaps use to improve the shiny application in the future (PLPviewer):