njtierney commented 7 months ago

The JOSS Guidelines state:

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)

There are several writing improvements that I believe should be required of a journal publication for this paper. I have tried to list them for the key components of the paper but have not comprehensively listed every single change that I think is required. For instance I have mentioned there are standard ways to list functions and arguments using back ticks or quotes, but I have not listed every single change that is required, but hopefully it should be clear.

Overall I think that this paper requires improvement in terms of language, and also structure, and reproducibility. I will give a few examples of these improvements in this section.

Language and structure in the paper

The paper has a few spelling mistakes - for example: "chararcteristics", "Compuational", "Desinging".

The paper needs editing for structure and writing quality. For example:

Based on the simulation results, the probability of success/power at the first stage is 32.7\% and at the second stage is around 58.63\%. Therefore, the overall power is around 90\%.

Using \ inside a sentence is not something I would expect in a standard of writing in a journal. I suggest replacing instances of this with the work "or". E.g., "Based on the simulation results, the probability of success (or power)..."

Additionally, using words like "around" here is informal, and I would replace instances of "around" with "approximately", or remove the word entirely.

Traditional two-arm randomized control trials are not an optimal choice when multiple experimental arms are available for testing efficacy.

The authors should provide a citation for this statement, and should state why they are inefficient.

Some packages that are available in R have limitations either in the number of treatment arms that can be incorporated in the package, the number of interim analyses that can be implemented in the package, or the different kinds of outcomes that the package can handle, but the MAMS package works well both for multiple treatment arms and multiple stages.

This is a very long sentence.

But the computational effort of obtaining stopping boundaries is very high when the number of stages exceeds 3.

The authors should state what the computation effort here means. What is high? Is it 5 minutes, 10 minutes? A day? Why is this a problem? Is this the only problem that this package, gsMAMS is solving?

This is the major hurdle of using MAMS package.

I would suggest stating "The long computational time is a major drawback of using the MAMS package.", rather than stating "hurdle".

The computational complexity of this package is very low.

I would argue that this package has reasonably high computational complexity, as there are many large functions in the package. What do the authors mean by the complexity being very low, and why is this relevant and important?

The FWER is controlled by Dunnett correction, which entails finding the root of an integral of a multivariate normal distribution.

I suggest describing all acronyms in their first use. What does FWER stand for? What is a Dunnett correction, and why is that important? Could the authors provide a citation here?

The multivariate normal densities are evaluated using the package mvtnorm

The mvtnorm package should be cited here. But more importantly, why are these densities being evaluated in the first place? The authors have provided this sentence with no context for the reasons of this being used.

The package is efficient for any number of treatment arms and stages, but it has a limitation that it is only configured for 10 stages.

This sentence suggests that there are no limitations, but then states that there are indeed limitations. I suggest clearly stating that it can be only used for up to 10 stages? Is that what the authors mean, and does this mean that there can be 100 arms, and 10 stages?

But for gsMAMS package with same trial configuration, the computational time to obtain stopping boundaries and sample size for three stages and four stages design is around 0.06 seconds for both the cases.

This sentence should be rewritten for expression - e.g.,

However using the same trial configuration as in MAMS, the gsMAMS package takes approximately 0.06 seconds to identify stopping boundaries and sample size for both three, and four stage designs. This is approximately 7000 - 270000 times faster.

This result should then be references with a table of computation time. I would suggest using the microbenchmark or bench R packages to demonstrate the comparison.

The design parameters of the trial can be calculated using the design_cont function and the arguments in the function correspond to standardized effect size in ineffective arm(delta0} and effective arm(delta1), type I error(alpha), type II error(beta), total number of treatment arms(K) and the information time (0.5, 1) is denoted by frac argument in the function.

When mentioning functions in the text they should go in backticks and parentheses added. e.g., deign_cont(). Additionally there is an errant } after argm(delta0}. Function arguments should be referenced in either backticks or quotes, e.g., frac or "frac". Also, there should be a space between the word and the parentheses.

All functions in the paper should be styled to have spaced around the = and all functions should be written with new lines for each argument, as specified in the documentation, e.g., https://github.com/Tpatni719/gsMAMS/blob/main/R/design_ord.R#L12-L18. This is important because it makes the function easier to read for the user.

For FWER and Stagewise FWER:

The operating characteristics of the trial can be generated using the op_power_cont and op_fwer_cont functions for power under alternative hypothesis and FWER under global null hypothesis respectively. Most of the arguments in the function are similar to size and SCPRT functions with the exception of number of simulations(nsim) and seed number(seed).

I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph. Functions should also be wrapped in backticks and have parentheses added as mentioned above. The SCPRT function is also mentioned here in all capitals but no example is given and no definition is given of what this function is or means in this context.

For ordinal outcome, we will consider ASCLEPIOS trial, a phase II trial for patients with stroke

This trial should be cited.

We will consider the treatment worthwhile if the odds ratio between the effective and control arms is 3.06 and we set the null odds ratio to be 1.32 which is the odds ratio between the ineffective and control arms

Why is an odds ration of 3.06 considered successful? Can the authors provide some citation for these numbers?

The design parameters for a five-arm (K = 4) trial

The parameters I believe are now all lower case. Also the documentation states that k is the "Number of treatment arms." So should this be k = 5?

outcomes in control group(), odds ratio of ineffective treatment group vs control(), odds ratio of effective treatment group vs control()

Why is "group()", and "control()" written like this?

For survival outcome, we will consider a MAMS trial with five arms (four treatment arms and a control arm, K=4) and two interim looks with balanced information time (0.5, 1). The null hazards ratio is 1 and the alternative hazards ratio is 0.65.

Could the authors indicate which of these parameters link back to the function arguments?

Reproducibility

I know that the author is concerned about the paper taking a long time to run when running so many simulations, but the author could solve this problem by specifying a small number of simulations, then writing the code to get the syntax right for the markdown, and then update the number of simulations and knit the document once.

The paper should use rmarkdown for formatting, to ensure reproducibility, as mentioned in #16. It might seem pedantic to insist that you write the results here like this instead of copying them, but in trying to replicate this paper I found that I was getting different results to what you had specified. Which makes me concerned that the results perhaps are inaccurate.

In addition, in trying to run the results of this paper I encountered several errors from the author not updating the syntax in the paper to use the latest syntax from the changes made in the software.

The numbers in the text are specified by hand based on the above results. These results should be inserted using inline R syntax to ensure the right numbers are specified. Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state:

The overall stopping probability should be around 1 which is the case here

However the stopping probability is listed as:

$`Stopping probability under alternative`
 look1  look2 
0.3334 0.6666

Which is not 1.

If you use rmarkdown inline syntax, you could write:

power_survey_results <- op_power_surv(m0 = 20, alpha = 0.05, beta = 0.1, p = 4, frac = c(1/2, 1), HR0 = 1, HR = 0.6703, nsim = 10000, ta = 40, tf = 20, kappa = 1, eta = 0, seed = 12)
power_pct <- percent(power_survey_results$Power, accuracy = 0.01)
pr_succes_first <- percent(power_survey_results$`Stopping probability under alternative`[1], accuracy = 0.01)
pr_succes_second <- percent(power_survey_results$`Stopping probability under alternative`[2], accuracy = 0.01)

power_pct <- percent(power_survey_results$Power, accuracy = 0.01)

Based on the simulation results, the probability of success/power at the first stage is `r pr_succes_first` and at the second stage is around `r pr_succes_second`. Therefore, the overall power is around `r power_pct`.

It also feels strange to in one instance mention the percentage down to 1 decimal place, but then for the power, to state, "the overall power is around 90%". I suggest stating 91.3%

Tpatni123 commented 7 months ago

The paper should use rmarkdown for formatting, to ensure reproducibility, as mentioned in https://github.com/Tpatni719/gsMAMS/issues/16. It might seem pedantic to insist that you write the results here like this instead of copying them, but in trying to replicate this paper I found that I was getting different results to what you had specified. Which makes me concerned that the results perhaps are inaccurate.

The numbers are correct. I think you didn't use the seed mentioned in the paper. I have just replicated the results for the survival outcome(power configuration)

In addition, in trying to run the results of this paper I encountered several errors from the author not updating the syntax in the paper to use the latest syntax from the changes made in the software.

I will update the hazard ratios in operating characteristics functions.

Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state:

The overall stopping probability should be around 1 which is the case here

However the stopping probability is listed as:

$`Stopping probability under alternative`
 look1  look2 
0.3334 0.6666 `

Which is not 1.

These two(0.3334+0.6666 ) add up to 1. I don't know what you mean by "Which is not 1".

Tpatni123 commented 7 months ago

It also feels strange to in one instance mention the percentage down to 1 decimal place, but then for the power, to state, "the overall power is around 90%". I suggest stating 91.3%

For a clinician who is running a trial, this is pretty much self-explanatory because the main purpose of running this operating characteristics function is to see whether we reach the desired power or not and in this case we did. So, that's why I didn't mention the number exactly and I just mentioned that we reached the desired power which is 90%.

Tpatni719 commented 7 months ago

Using \ inside a sentence is not something I would expect in a standard of writing in a journal. I suggest replacing instances of this with the work "or". E.g., "Based on the simulation results, the probability of success (or power)..." Additionally, using words like "around" here is informal, and I would replace instances of "around" with "approximately", or remove the word entirely.

Done!

The authors should provide a citation for this statement, and should state why they are inefficient.

I have provided the citation and I think citation is sufficient here.

The authors should state what the computation effort here means. What is high? Is it 5 minutes, 10 minutes? A day? Why is this a problem? Is this the only problem that this package, gsMAMS is solving?

I have quantified the high computational effort of MAMS package relative to our package in the computational aspects. And a clinician/researcher doesn't want to wait for longer duration(e.g. 3 hours) just to get the design parameters and what if he/she wants to tweak some parameters to see how the design is changing then again, the clinician has to wait for 3hours to get the design parameters. So, this is a very obvious problem which I think I don't have to explain it explicitly for the target audience.

I would suggest stating "The long computational time is a major drawback of using the MAMS package.", rather than stating "hurdle".

Done!!

I would argue that this package has reasonably high computational complexity, as there are many large functions in the package. What do the authors mean by the complexity being very low, and why is this relevant and important?

Our design functions have low computational time relative to MAMS package which is what we have demonstrated in the paper. We have not compared the operating characteristics functions. This is relevant and important for the reasons mentioned above and just to add an additional point, sometimes, we have to change the design parameters for a trial which is already running. So, in that case, we can't wait for 4-5 hours just to get the design parameters as we don't want to enroll further patients if the trial claimed futility based on the new parameters.

I suggest describing all acronyms in their first use. What does FWER stand for? What is a Dunnett correction, and why is that important? Could the authors provide a citation here?

Dunnett correction is a common multiple testing comparison procedure and the audience using such package is already cognizant of such methods. And respectfully, I think the current information is sufficient to implement the package and understand the functions. I have mentioned the links of the paper in the description for people interested in the methodology.

The mvtnorm package should be cited here. But more importantly, why are these densities being evaluated in the first place? The authors have provided this sentence with no context for the reasons of this being used.

I have cited the package and these densities are evaluated because under global null, we assume a multivariate normal distribution and again, all the details are in the methodology paper.

This sentence suggests that there are no limitations, but then states that there are indeed limitations. I suggest clearly stating that it can be only used for up to 10 stages? Is that what the authors mean, and does this mean that there can be 100 arms, and 10 stages?

In an actual clinical trial setting, it is very very unlikely to have more than 10 interim looks. So, contextually, it is not a limitation considering the nature of a clinical trial and that's why we have structured the paragraph in such a way.

This sentence should be rewritten for expression - e.g., This result should then be references with a table of computation time. I would suggest using the microbenchmark or bench R packages to demonstrate the comparison.

I have changed the wording of the expression and I have already provided an example in the paper which I think is enough to demonstrate the low computational time of our package.

When mentioning functions in the text they should go in backticks and parentheses added. e.g., deign_cont(). Additionally there is an errant } after argm(delta0}. Function arguments should be referenced in either backticks or quotes, e.g., frac or "frac". Also, there should be a space between the word and the parentheses. All functions in the paper should be styled to have spaced around the = and all functions should be written with new lines for each argument, as specified in the documentation, e.g., https://github.com/Tpatni719/gsMAMS/blob/main/R/design_ord.R#L12-L18. This is important because it makes the function easier to read for the user.

Done!!

I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph. Functions should also be wrapped in backticks and have parentheses added as mentioned above. The SCPRT function is also mentioned here in all capitals but no example is given and no definition is given of what this function is or means in this context.

FWER and Stagewise FWER are just a sub-heading under each type of outcome. I have addressed the latter part of the function.

The parameters I believe are now all lower case. Also the documentation states that k is the "Number of treatment arms." So should this be k = 5?

I have changed it to lowercase and it is k=4. k is without the control arm.

Why is "group()", and "control()" written like this?

I have checked the paper and I have provided the arguments inside the brackets.

Could the authors indicate which of these parameters link back to the function arguments?

k=4, hr0=1, hr1=0.67

njtierney commented 6 months ago

Reproducibility

The numbers are correct. I think you didn't use the seed mentioned in the paper. I have just replicated the results for the survival outcome(power configuration)

My point is that there is manual copying and pasting of results, which is a very easy place to introduce human errors. I suggest using rmarkdown to generate the .md format, as this will help eliminate this problem.

Regarding writing, I stated:

Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state: The overall stopping probability should be around 1 which is the case here. However the stopping probability is listed as:

$`Stopping probability under alternative`
 look1  look2 
0.3334 0.6666 `
Which is not 1.

In response you said:

These two(0.3334+0.6666 ) add up to 1. I don't know what you mean by "Which is not 1".

My point then is that this is not made clear in the text.

Writing

For a clinician who is running a trial, this is pretty much self-explanatory because the main purpose of running this operating characteristics function is to see whether we reach the desired power or not and in this case we did. So, that's why I didn't mention the number exactly and I just mentioned that we reached the desired power which is 90%.

My point is that in a journal standard of writing the writing should be precise. The text could instead state something like what you just said - that "the desired power of 90% has been met, with the power being 91.3%". Does that make sense?

Expanding on citation

I have provided the citation and I think citation is sufficient here.

in which the context was my comment on this sentence in the paper:

Traditional two-arm randomized control trials are not an optimal choice when multiple experimental arms are available for testing efficacy.

I disagree with your comment that the citation is sufficient. My point is this: This sentence does not describe why these are not optimal - with respect to what? Statistical power? Cost? Clinical outcomes? The paper that you reference gives several reasons for why multi arm trials are better, and so I think it is reasonable to add a short sentence describing the reasons they are optimal or a better a choice.

Computation

I have quantified the high computational effort of MAMS package relative to our package in the computational aspects. And a clinician/researcher doesn't want to wait for longer duration(e.g. 3 hours) just to get the design parameters and what if he/she wants to tweak some parameters to see how the design is changing then again, the clinician has to wait for 3hours to get the design parameters. So, this is a very obvious problem which I think I don't have to explain it explicitly for the target audience.

Given that JOSS is a journal focussing on open source software, I think it is reasonable to explain the computational aspects of the software you have written. You state:

But the computational effort of obtaining stopping boundaries is very high when the number of stages exceeds 3. The long computational time is the major drawback of using the MAMS package.

The phrase "computational effort" is vague. I know that you have given more detail in the "computational aspects" section, but in this specific sentence that I have described, I suggest taking some of what you said above about the long computational time, and putting that into appropriate text in the journal.

Computation - why is gsMAMS faster

In the paper you have no mentioned why your method is so much faster, I think that this is actually really important to address. Is there a special method or approach that you are using? Why is MAMS slow in comparison?

Writing

I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph.

This still has not been addressed - this should either be a heading, or the start of a sentence.

Thank you making the changes in referencing k, frac, hr0 and hr1, however there are still some things that need tidying up:

For survival outcome, we will consider a MAMS trial with five arms (four treatment arms and a control arm, k=4) and two interim looks with balanced information time frac=c(0.5, 1). The null hazards ratio is (hr0)1 and the alternative hazards ratio is (hr1)0.67.

The arguments should be in parentheses, e.g., "(frac = c(0.5, 1))".

Tpatni123 commented 6 months ago

My point then is that this is not made clear in the text.

I have changed it but I have already mentioned and explained it thoroughly in the continuous outcome. So, I don't know why this is not clear in survival outcome.

My point is that in a journal standard of writing the writing should be precise. The text could instead state something like what you just said - that "the desired power of 90% has been met, with the power being 91.3%". Does that make sense?

Done!!

I disagree with your comment that the citation is sufficient. My point is this: This sentence does not describe why these are not optimal - with respect to what? Statistical power? Cost? Clinical outcomes? The paper that you reference gives several reasons for why multi arm trials are better, and so I think it is reasonable to add a short sentence describing the reasons they are optimal or a better a choice.

Done!!

The phrase "computational effort" is vague. I know that you have given more detail in the "computational aspects" section, but in this specific sentence that I have described, I suggest taking some of what you said above about the long computational time, and putting that into appropriate text in the journal.

Done!!

In the paper you have no mentioned why your method is so much faster, I think that this is actually really important to address. Is there a special method or approach that you are using? Why is MAMS slow in comparison?

Done! I have added the necessary details(the details regarding SCPRT boundary calculation is already mentioned in this section) and added the paper for reference.

This still has not been addressed - this should either be a heading, or the start of a sentence. The arguments should be in parentheses, e.g., "(frac = c(0.5, 1))".

Done!!

njtierney commented 6 months ago

Thank you for taking the time to address these changes!

The paper requires a proof read for minor grammar checks - for example there are a few instances of no spaces after parentheses and no spaces after commas. In other journals the paper would go through a proof read from a manuscript editor and they would make these changes. However, I think that JOSS does not provide this, would you be able to check the paper for these changes and other grammatical fixes?

Nearly there!

Tpatni719 commented 6 months ago

Thank you for all the recommendations! And sure, I will do that and apprise you about it.

Tpatni719 commented 6 months ago

I have done the corrections and thanks again for the recommendations!

njtierney commented 6 months ago

Great, thanks!

Tpatni719 / gsMAMS

JOSS Review: Paper- Quality of writing #17

Language and structure in the paper

Reproducibility

Reproducibility

Writing

Expanding on citation

Computation

Computation - why is gsMAMS faster

Writing