JOSS Review - Githubissues

Hi @VincentAlcazer,

Thank you for your submission to JOSS and for inviting me to review. I think StatAid is a great app, and I wish I had come across it sooner!!

Here are my comments and suggestions:

Regarding the app

Introduction a. The text bleeds off of the grey content background (noticed both on Chrome and Safari browsers):

Data loading In addition to the sample dataset already provided, I was able to explore the app using this penguin dataset as well. It uploaded and worked well.

a. Consider emphasizing the sentence describing the file types supported by your app, so it stands out for new users: e.g. make the "...your dataframe should be saved as a tab (.txt/.tsv), comma or semicolon (.csv) delimited file" bold or italic or add some sort of emphasis on this key point!

b. Regarding the example dataset: In the Readme or vignette/tutorial, consider giving a slightly more detailed explanation of each variable. Also, consider your target audience....if it's worldwide beyond France, you should think about changing the variable name "Sexe" to "Sex".

c. Enable horizontal scrolling in the table view because columns are otherwise cut off to the right:

They remain cut off even when I zoom out a lot:

Data exploration a. In "Categorical variables" exploration, there is a very useful "Summary table". It's showing the breakdown by only one variable, however, and not both variables plotted (for e.g. no FAB in the screenshot below):

b. In the Descriptive table, I think it's great and I appreciate you put in a short explanation of the methods used. Suggestions for the methods paragraph, below, are:

Methods : Categorical variables are expressed as n (%) and compared with the Chi-squared test or its non-parametric alternative Fisher's test with simulated p-values. Numerical variables are expressed as mean (standard-derivation) or median [IQR] and compared with else Welch's t-test (or its non-parametric alternative Wilcoxon's rank-sum test) or ANOVA (or its non-parametric alternative Kruskal-Wallis test) where appropriate. Variables with >80% missing are removed from the analysis. P-values are not adjusted.

Change standard-derivation to standard deviation, IQR to interquartile range, and else to either above.

c. In custom graphs, the text might be overlaid on the graph itself...so possibly have an option to increase Y-axis limits? I tried increasing the Y axis limits, but didn't see a change in the plot:

Paired-data analysis section a. In the data format, you state: "each row correspond to an observation (sample/patient), each selected variable/column to a different timepoint of the same variable (e.g. Concentration_day0, concentration_day15, concentration_day30...)"

The example dataset you provided doesn't have columns corresponding to the names above. I know users will have their own datasets with paired/time-dependent columns, but for the sake of a good explanation, your sample dataset should have columns which can be used in this case, and whose column names are identifiable in the example you give above. Doing so will be much more intuitive for new users.
Univariate analysis a. I could run the linear model, but not GAM or LOESS in the Continuous Univariate analysis section:

The error I get is:
```
Warning: Error in eval: object 'Age' not found
```
I'll check to see if there's an outdated package at my end causing this issue. If so, others might comes across this issue as well.

Points on the paper, documentation, and tests a. Your statement of need says "The package provides all the tools necessary to perform a complete data analysis in an intuitive ready-to-use graphical interface. Any user with no coding skill or no software with paid-license available can easily perform all the steps of a good statistical analysis, from data-exploration/quality check to multivariate modeling."

I think this is an excellent application that provides essential and core statistical analytical tools, however, I don't think it provides "all the tools for a complete data analysis". I'm sure you'll agree that there are many different types of analyses that suit different situations (and have various assumptions), for e.g. mixed-effects models might in many cases be more appropriate than standard linear regression models in health data analyses. It's great that you later specify that users can request new features (further adding emphasis to the point above that not all tools are included) - I think this evolving aspect, in combination with the existing features make it a solid package. I therefore recommend you to update your Statement of need to reflect your package's use better.

While I see how useful this application can be (and believe it's been used as you say), I also agree with Adithi regarding prior use of this software: if you state that it's been used by researchers and doctors, it helps to have a citation, else it's not verifiable.

Btw, regarding other apps that do the similar things, I came across other shiny apps that provide various aspects of statistical analysis and visualization, for e.g. https://pkgs.rsquaredacademy.com/, but your app is unique in that it incorporates various steps of the analytical workflow in one package itself.

b. In both the paper and README, under the features section, there is the point "Paired data analysis (case-control studies, repeated measures)". Case control studies are a type of observational study and it's not necessary that paired data analysis will always be used for this kind of study (for e.g., even logistic regression can be used, depending on the type of case-control study, matched pairs vs not, etc).

c. One important thing about the analyses you've included: there are no references about functions from various packages (or base R) that are used for statistical analyses, nor any references about the statistical methods themselves. The Contact/About page has the full package list though, and it would be a good place to add summarized information about specific functions used to perform various tests: e.g. glm(), coxph(), gam(), cor.test(), and any others you used. Your users can then look these functions up for further references and also get an idea of default function options and any other pertinent information. This could be crucial for people considering using your package for their research and publications.

d. In line with Adithi's suggestion, I recommend adding more examples to your documentation. Beyond making it easier for users to grasp what your package is doing and enabling them to see what's possible, it is also a requirement for JOSS submission that we have to check for. The addition of a short description of the dataset helped immensely, and adding more examples to the README would be useful while you work on your detailed tutorial on the side.

e. Regarding tests, you've got three tests pertaining to the app itself (as seen in test-golem-recommended.R and when devtools::test() is run). There aren't any tests for the functions in Functions.R themselves. It's recommended to have tests for functions in your package, especially as it starts adding more features and becomes more complex. They'll also lend further credibility to your package. A useful resource for this would be https://mastering-shiny.org/scaling-testing.html. I suggest adding a few more tests, as also recommended by Adithi, while understanding that it's an iterative process which will evolve even after the review is finished.

I've made minor typographical corrections to your app interface, documentation and paper, which you can see via my pull request. It was easier to do it like that than mention them separately.
Some suggestions which you could keep in mind as your package evolves. You don't need to incorporate them now, but it's food for thought:

a. X axis labels can go outside of plot margins if the labels are really large (for e.g. with the penguin dataset). Ideally the labels won't be so big, but just bringing this to your notice.

b. Using color palettes suitable for the colorblind people amongst us. Right now the plots will appear as below to people with Deuteranopia (red-green colorblindness, which is the more common form):

It's difficult to choose a colorblind friendly palette when there are many categories, but as seen in the topmost image, a palette could be chosen such that it is more easily distinguishable when there are fewer categories.

c. A way to export the tables and graphs would be great.

d. The figure options menu on the right overlaps with the plots depending on the zoom level: At 100% zoom:

At 90% zoom:

Please let me know if you have any questions on my comments above. I think this app is really good, and I can definitely see people using it to explore and work with their data.

Merci, Nistara

Additional comment: the references' capitalizations for R need to be fixed.

Dear @nistara ,

Thank you very much for your review and feedback. I am currently away from home and I will take the time to respond point by point as soon as possible.

Best regards,

Vincent

Dear @nistara ,

Thank you again for your complete review of my work and for your relevant comments. I just uploaded a corrected version (1.02) of StatAid on github and Shinyapps.io. I managed to take into account most of your comments and it really helped me improving the app. However, there is still some points (mostly in line with @adithirgis comments) that I could not manage to modify yet, but I will work on it for the future corrections:

The GAM / LOESS issues, also reported by @adithirgis: I could not manage to reproduce the error and tried to add the mgcv package as a systematic loading. Please tell me if it resolved the error for you.
The detailed tutorial/guide is still work in progress.
The need for more test, as commented in my answer, will also take some time.

You will find attached a detailed list of the corrections. Thank you again for your time.

Best regards,

Vincent

This has been fixed.

2.a. The sentence has been put in bold. 2.b. The variable name has been changed. A short description of the dataset has been added directly on the app. A more complete tutorial/description is currently under preparation for the readme, as suggested by @ adithirgis. A more complete description of the dataframe will be included as suggested. 2.c. The table has been fixed with a horizontal scrollbar.

3.a. Variable count has been added to the table. 3.b. The changes have been made. 3.c. The Y-limits control slider has been fixed.

4.a. Three simulated “concentration” data columns have been added to allow the testing of this feature with the base dataset.

5.a. @adithirgis reported the same issue. I could not manage to reproduce the error even after having installed StatAid on a new computer with no R nor Rstudio installation. I will try to see on a computer with a previously installed R version. I tried to add the mgcv package as a systematic import to see if this could correct the issue.

a. I agree with you and have modified the paper consequently. I am no longer talking about a “complete” data analysis, but kept “all the steps of a data analysis”, talking here more about the different steps (from data encoding, quality and distribution check to analysis itself) than features exhaustivity. As there is currently no evidence of StatAid use (and as this point as also been noted by @adithirgis), the sentence “It has already been used by multiple researchers and students for their PhD or medical doctors for their thesis” has been removed. b. This point has been modified as suggested: “- Paired data analysis (Repeated measures...)” c. The package list in the contact section has been complete modified to include a more complete description of packages and base R function usage. d. In line with 2.b, this will be included in the complete tutorial I am working on. e. Thank you for this reference from mastering-shiny.org. I read with interest the test section, together with multiple tutorials. I built StatAid in a way that each loaded dataset is formatted in a tidy way working with the functions I designed. The limiting step is currently the loading and formatting (if the data load correctly, the function will work), that’s why I find it hard to see which specific test for my functions I could use (according to test_that: I hardly see the point to add an “expect_equal” or “expect_error” in my functions). I took note that it is an important for enhancing contributions and code sharing and will try to figure out what relevant test I could add here.
All the correction have been merged. Thank you very much for this diligent proofreading.
a. Thank you for this suggestion. For long variable name, I usually invite to set only the external legend on the right or bottom for example and hide it on the x-axis. I will evaluate the necessity to add a function to cut variable names longer than n characters if users really want to see it on the x axis. I could also extend the angle to 90° so it does not longer go outside of the margins. b. This is indeed an important point. I have already added different color palettes for the custom graph section, but all the exploration modules are plotted with the base ggplot2 colors. I will work on a “colorblind” mode for these sections in the right panel. c. A download button has been added for all the tables. Graphs can already be downloaded with a right click. I would like to add a download button that allows to modify the size and dpi of the graph, but I have not found any module that provide such feature for the moment.

PS: The references' capitalization have been fixed!

Thanks @VincentAlcazer ! I'll go through the edits and get back to you shortly.

Dear @nistara ,

I hope your are doing well.

I wanted to inform you that I managed to correct the three remaining points of the review. You will find in the updated v1.03:

A correction of the GAM/LOESS and descriptive table issue
A quick-start user guide with usage example (can be found in the readme or here: https://github.com/VincentAlcazer/StatAid/blob/master/STATAID_QUICK_START_USER_GUIDE.pdf)
Implemented test for each function of the package.

Thanks again for your time and comments. I'm looking forward to your feedback.

Best regards,

Vincent

Dear @VincentAlcazer ,

Thanks for incorporating the edits even as you're busy with the ongoing pandemic. Hope you're doing well and are safe.

Comments on the edits: I'm confirming that the GAM/LOESS models are working for me, and that the quick-start user guide is accessible and helpful for people to get acquainted with your app.

6 b. This point has been modified as suggested: “- Paired data analysis (Repeated measures...)” I somehow still see this as Paired data analysis (case-control studies, repeated measures) in the latest paper.md and the app Intro page. In addition, there's a minor typo in this line in the app Intro page (repeated's spelling is wrong).

It'd be fine if you specified "matched case-control studies" above instead of just case-control. (As an aside, you might be interested in this paper: https://www.bmj.com/content/352/bmj.i969)

_6 e. Thank you for this reference from mastering-shiny.org. I read with interest the test section, together with multiple tutorials. I built StatAid in a way that each loaded dataset is formatted in a tidy way working with the functions I designed. The limiting step is currently the loading and formatting (if the data load correctly, the function will work), that’s why I find it hard to see which specific test for my functions I could use (according to test_that: I hardly see the point to add an “expect_equal” or “expecterror” in my functions). I took note that it is an important for enhancing contributions and code sharing and will try to figure out what relevant test I could add here.

It's great you added the tests! It's a good start and something that should help you going forward as you incorporate additional functionalities and tests to your app. I did get one error when I ran devtools::test() on your tests. It was for the regression_table_multi_lm test:

res$`Beta Coeff.`[1] not equal to 1.3.
1/1 mismatches
[1] 1.34 - 1.3 == 0.04

You might want to change

expect_equal(res$`Beta Coeff.`[1],1.3)

expect_equal(res$`Beta Coeff.`[1],1.34)

Or would that break the test at your end?

Also, in Paired-data analysis Methods, I assume the IC95 in mean (IC95) stands for "Intervalle de confiance". Considering the target audience of many countries, it would be preferable to change it to 95CI (confidence interval).

The following are minor comments that won't affect the conclusion of my review, but are put here for your consideration:

3.c. The Y-limits control slider has been fixed. Confirming that the slider's been fixed, however, the text still appears on the plots even after increasing the y-axis limits, and can be obscured by it.

6 b. As there is currently no evidence of StatAid use (and as this point as also been noted by @adithirgis), the sentence “It has already been used by multiple researchers and students for their PhD or medical doctors for their thesis” has been removed.

I still see this line even though I'm working on the latest file versions (to the best of my knowledge). I'm quite fine with you leaving it in since you've already clarified to us that it's not formally been cited, but it has been used.

8 b. This is indeed an important point. I have already added different color palettes for the custom graph section, but all the exploration modules are plotted with the base ggplot2 colors. I will work on a “colorblind” mode for these sections in the right panel.

Sounds good! This might be helpful: https://github.com/clauswilke/colorblindr I really appreciate that you have different color palettes to choose from. It's a great feature.

8 c. A download button has been added for all the tables. Graphs can already be downloaded with a right click. I would like to add a download button that allows to modify the size and dpi of the graph, but I have not found any module that provide such feature for the moment.

There's no download option for the summary table in Data Exploration -> Categorical variables.

Finally, in your custom plot, the order of the variables in the plot is different from that of the legend. So visually, they're in reverse order.

Well, that's it from me! Excellent work on this app, and I look forward to using it.

Best wishes, Nistara

Dear @nistara ,

The paper.md was not up-to-date, sorry for this oversight. I think there was a conflict when I merged the pull request. It should now be online (correcting the matched case-control study and the sentences I had modified), together with the introduction correction and the 95CI (I tried to track all the French abbreviations but this one stood under my radars!). The test has also been corrected.

Concerning the minor corrections: The X-variables order has been corrected. I will take the time to modify the plot function so that statistics will move with the y-axis limits.

Thank you again for your time and thorough your review.

Best regards,

Vincent

Dear @VincentAlcazer,

Perfect! Thank you for incorporating the suggestions. My last minor comment is for you to update the Readme's Paired data analysis point so that it's consistent with your paper and app Intro.

I'm going to go ahead and close this issue. Thank you for inviting me as a reviewer: it was a pleasure to check this package out and explore its many functionalities.

Best wishes, Nistara

VincentAlcazer / StatAid

JOSS Review #2

Regarding the app