Closed njtierney closed 6 months ago
The paper should use rmarkdown for formatting, to ensure reproducibility, as mentioned in https://github.com/Tpatni719/gsMAMS/issues/16. It might seem pedantic to insist that you write the results here like this instead of copying them, but in trying to replicate this paper I found that I was getting different results to what you had specified. Which makes me concerned that the results perhaps are inaccurate.
The numbers are correct. I think you didn't use the seed mentioned in the paper. I have just replicated the results for the survival outcome(power configuration)
In addition, in trying to run the results of this paper I encountered several errors from the author not updating the syntax in the paper to use the latest syntax from the changes made in the software.
I will update the hazard ratios in operating characteristics functions.
Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state:
The overall stopping probability should be around 1 which is the case here
However the stopping probability is listed as:
$`Stopping probability under alternative`
look1 look2
0.3334 0.6666 `
Which is not 1.
These two(0.3334+0.6666 ) add up to 1. I don't know what you mean by "Which is not 1".
It also feels strange to in one instance mention the percentage down to 1 decimal place, but then for the power, to state, "the overall power is around 90%". I suggest stating 91.3%
For a clinician who is running a trial, this is pretty much self-explanatory because the main purpose of running this operating characteristics function is to see whether we reach the desired power or not and in this case we did. So, that's why I didn't mention the number exactly and I just mentioned that we reached the desired power which is 90%.
Using \ inside a sentence is not something I would expect in a standard of writing in a journal. I suggest replacing instances of this with the work "or". E.g., "Based on the simulation results, the probability of success (or power)..." Additionally, using words like "around" here is informal, and I would replace instances of "around" with "approximately", or remove the word entirely.
Done!
The authors should provide a citation for this statement, and should state why they are inefficient.
I have provided the citation and I think citation is sufficient here.
The authors should state what the computation effort here means. What is high? Is it 5 minutes, 10 minutes? A day? Why is this a problem? Is this the only problem that this package, gsMAMS is solving?
I have quantified the high computational effort of MAMS package relative to our package in the computational aspects. And a clinician/researcher doesn't want to wait for longer duration(e.g. 3 hours) just to get the design parameters and what if he/she wants to tweak some parameters to see how the design is changing then again, the clinician has to wait for 3hours to get the design parameters. So, this is a very obvious problem which I think I don't have to explain it explicitly for the target audience.
I would suggest stating "The long computational time is a major drawback of using the MAMS package.", rather than stating "hurdle".
Done!!
I would argue that this package has reasonably high computational complexity, as there are many large functions in the package. What do the authors mean by the complexity being very low, and why is this relevant and important?
Our design functions have low computational time relative to MAMS package which is what we have demonstrated in the paper. We have not compared the operating characteristics functions. This is relevant and important for the reasons mentioned above and just to add an additional point, sometimes, we have to change the design parameters for a trial which is already running. So, in that case, we can't wait for 4-5 hours just to get the design parameters as we don't want to enroll further patients if the trial claimed futility based on the new parameters.
I suggest describing all acronyms in their first use. What does FWER stand for? What is a Dunnett correction, and why is that important? Could the authors provide a citation here?
Dunnett correction is a common multiple testing comparison procedure and the audience using such package is already cognizant of such methods. And respectfully, I think the current information is sufficient to implement the package and understand the functions. I have mentioned the links of the paper in the description for people interested in the methodology.
The mvtnorm package should be cited here. But more importantly, why are these densities being evaluated in the first place? The authors have provided this sentence with no context for the reasons of this being used.
I have cited the package and these densities are evaluated because under global null, we assume a multivariate normal distribution and again, all the details are in the methodology paper.
This sentence suggests that there are no limitations, but then states that there are indeed limitations. I suggest clearly stating that it can be only used for up to 10 stages? Is that what the authors mean, and does this mean that there can be 100 arms, and 10 stages?
In an actual clinical trial setting, it is very very unlikely to have more than 10 interim looks. So, contextually, it is not a limitation considering the nature of a clinical trial and that's why we have structured the paragraph in such a way.
This sentence should be rewritten for expression - e.g., This result should then be references with a table of computation time. I would suggest using the microbenchmark or bench R packages to demonstrate the comparison.
I have changed the wording of the expression and I have already provided an example in the paper which I think is enough to demonstrate the low computational time of our package.
When mentioning functions in the text they should go in backticks and parentheses added. e.g., deign_cont(). Additionally there is an errant } after argm(delta0}. Function arguments should be referenced in either backticks or quotes, e.g., frac or "frac". Also, there should be a space between the word and the parentheses. All functions in the paper should be styled to have spaced around the = and all functions should be written with new lines for each argument, as specified in the documentation, e.g., https://github.com/Tpatni719/gsMAMS/blob/main/R/design_ord.R#L12-L18. This is important because it makes the function easier to read for the user.
Done!!
I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph. Functions should also be wrapped in backticks and have parentheses added as mentioned above. The SCPRT function is also mentioned here in all capitals but no example is given and no definition is given of what this function is or means in this context.
FWER and Stagewise FWER are just a sub-heading under each type of outcome. I have addressed the latter part of the function.
The parameters I believe are now all lower case. Also the documentation states that k is the "Number of treatment arms." So should this be k = 5?
I have changed it to lowercase and it is k=4. k is without the control arm.
Why is "group()", and "control()" written like this?
I have checked the paper and I have provided the arguments inside the brackets.
Could the authors indicate which of these parameters link back to the function arguments?
k=4, hr0=1, hr1=0.67
The numbers are correct. I think you didn't use the seed mentioned in the paper. I have just replicated the results for the survival outcome(power configuration)
My point is that there is manual copying and pasting of results, which is a very easy place to introduce human errors. I suggest using rmarkdown to generate the .md format, as this will help eliminate this problem.
Regarding writing, I stated:
Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state: The overall stopping probability should be around 1 which is the case here. However the stopping probability is listed as:
$`Stopping probability under alternative`
look1 look2
0.3334 0.6666 `
Which is not 1.
In response you said:
These two(0.3334+0.6666 ) add up to 1. I don't know what you mean by "Which is not 1".
My point then is that this is not made clear in the text.
For a clinician who is running a trial, this is pretty much self-explanatory because the main purpose of running this operating characteristics function is to see whether we reach the desired power or not and in this case we did. So, that's why I didn't mention the number exactly and I just mentioned that we reached the desired power which is 90%.
My point is that in a journal standard of writing the writing should be precise. The text could instead state something like what you just said - that "the desired power of 90% has been met, with the power being 91.3%". Does that make sense?
I have provided the citation and I think citation is sufficient here.
in which the context was my comment on this sentence in the paper:
Traditional two-arm randomized control trials are not an optimal choice when multiple experimental arms are available for testing efficacy.
I disagree with your comment that the citation is sufficient. My point is this: This sentence does not describe why these are not optimal - with respect to what? Statistical power? Cost? Clinical outcomes? The paper that you reference gives several reasons for why multi arm trials are better, and so I think it is reasonable to add a short sentence describing the reasons they are optimal or a better a choice.
I have quantified the high computational effort of MAMS package relative to our package in the computational aspects. And a clinician/researcher doesn't want to wait for longer duration(e.g. 3 hours) just to get the design parameters and what if he/she wants to tweak some parameters to see how the design is changing then again, the clinician has to wait for 3hours to get the design parameters. So, this is a very obvious problem which I think I don't have to explain it explicitly for the target audience.
Given that JOSS is a journal focussing on open source software, I think it is reasonable to explain the computational aspects of the software you have written. You state:
But the computational effort of obtaining stopping boundaries is very high when the number of stages exceeds 3. The long computational time is the major drawback of using the
MAMS
package.
The phrase "computational effort" is vague. I know that you have given more detail in the "computational aspects" section, but in this specific sentence that I have described, I suggest taking some of what you said above about the long computational time, and putting that into appropriate text in the journal.
In the paper you have no mentioned why your method is so much faster, I think that this is actually really important to address. Is there a special method or approach that you are using? Why is MAMS slow in comparison?
I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph.
This still has not been addressed - this should either be a heading, or the start of a sentence.
Thank you making the changes in referencing k
, frac
, hr0
and hr1
, however there are still some things that need tidying up:
For survival outcome, we will consider a MAMS trial with five arms (four treatment arms and a control arm,
k
=4) and two interim looks with balanced information timefrac
=c(0.5, 1). The null hazards ratio is (hr0
)1 and the alternative hazards ratio is (hr1
)0.67.
The arguments should be in parentheses, e.g., "(frac
= c(0.5, 1))".
My point then is that this is not made clear in the text.
I have changed it but I have already mentioned and explained it thoroughly in the continuous outcome. So, I don't know why this is not clear in survival outcome.
My point is that in a journal standard of writing the writing should be precise. The text could instead state something like what you just said - that "the desired power of 90% has been met, with the power being 91.3%". Does that make sense?
Done!!
I disagree with your comment that the citation is sufficient. My point is this: This sentence does not describe why these are not optimal - with respect to what? Statistical power? Cost? Clinical outcomes? The paper that you reference gives several reasons for why multi arm trials are better, and so I think it is reasonable to add a short sentence describing the reasons they are optimal or a better a choice.
Done!!
The phrase "computational effort" is vague. I know that you have given more detail in the "computational aspects" section, but in this specific sentence that I have described, I suggest taking some of what you said above about the long computational time, and putting that into appropriate text in the journal.
Done!!
In the paper you have no mentioned why your method is so much faster, I think that this is actually really important to address. Is there a special method or approach that you are using? Why is MAMS slow in comparison?
Done! I have added the necessary details(the details regarding SCPRT boundary calculation is already mentioned in this section) and added the paper for reference.
This still has not been addressed - this should either be a heading, or the start of a sentence. The arguments should be in parentheses, e.g., "(frac = c(0.5, 1))".
Done!!
Thank you for taking the time to address these changes!
The paper requires a proof read for minor grammar checks - for example there are a few instances of no spaces after parentheses and no spaces after commas. In other journals the paper would go through a proof read from a manuscript editor and they would make these changes. However, I think that JOSS does not provide this, would you be able to check the paper for these changes and other grammatical fixes?
Nearly there!
Thank you for all the recommendations! And sure, I will do that and apprise you about it.
I have done the corrections and thanks again for the recommendations!
Great, thanks!
The JOSS Guidelines state:
There are several writing improvements that I believe should be required of a journal publication for this paper. I have tried to list them for the key components of the paper but have not comprehensively listed every single change that I think is required. For instance I have mentioned there are standard ways to list functions and arguments using back ticks or quotes, but I have not listed every single change that is required, but hopefully it should be clear.
Overall I think that this paper requires improvement in terms of language, and also structure, and reproducibility. I will give a few examples of these improvements in this section.
Language and structure in the paper
The paper has a few spelling mistakes - for example: "chararcteristics", "Compuational", "Desinging".
The paper needs editing for structure and writing quality. For example:
Using
\
inside a sentence is not something I would expect in a standard of writing in a journal. I suggest replacing instances of this with the work "or". E.g., "Based on the simulation results, the probability of success (or power)..."Additionally, using words like "around" here is informal, and I would replace instances of "around" with "approximately", or remove the word entirely.
The authors should provide a citation for this statement, and should state why they are inefficient.
This is a very long sentence.
The authors should state what the computation effort here means. What is high? Is it 5 minutes, 10 minutes? A day? Why is this a problem? Is this the only problem that this package,
gsMAMS
is solving?I would suggest stating "The long computational time is a major drawback of using the
MAMS
package.", rather than stating "hurdle".I would argue that this package has reasonably high computational complexity, as there are many large functions in the package. What do the authors mean by the complexity being very low, and why is this relevant and important?
I suggest describing all acronyms in their first use. What does FWER stand for? What is a Dunnett correction, and why is that important? Could the authors provide a citation here?
The
mvtnorm
package should be cited here. But more importantly, why are these densities being evaluated in the first place? The authors have provided this sentence with no context for the reasons of this being used.This sentence suggests that there are no limitations, but then states that there are indeed limitations. I suggest clearly stating that it can be only used for up to 10 stages? Is that what the authors mean, and does this mean that there can be 100 arms, and 10 stages?
This sentence should be rewritten for expression - e.g.,
This result should then be references with a table of computation time. I would suggest using the
microbenchmark
orbench
R packages to demonstrate the comparison.When mentioning functions in the text they should go in backticks and parentheses added. e.g.,
deign_cont()
. Additionally there is an errant}
afterargm(delta0}
. Function arguments should be referenced in either backticks or quotes, e.g.,frac
or "frac". Also, there should be a space between the word and the parentheses.All functions in the paper should be styled to have spaced around the
=
and all functions should be written with new lines for each argument, as specified in the documentation, e.g., https://github.com/Tpatni719/gsMAMS/blob/main/R/design_ord.R#L12-L18. This is important because it makes the function easier to read for the user.I'm not sure why "For FWER and Stagewise FWER:" has a column at the end, this should either be a new heading or form some part of the introduction sentence for this paragraph. Functions should also be wrapped in backticks and have parentheses added as mentioned above. The SCPRT function is also mentioned here in all capitals but no example is given and no definition is given of what this function is or means in this context.
This trial should be cited.
Why is an odds ration of 3.06 considered successful? Can the authors provide some citation for these numbers?
The parameters I believe are now all lower case. Also the documentation states that
k
is the "Number of treatment arms." So should this bek = 5
?Why is "group()", and "control()" written like this?
Could the authors indicate which of these parameters link back to the function arguments?
Reproducibility
I know that the author is concerned about the paper taking a long time to run when running so many simulations, but the author could solve this problem by specifying a small number of simulations, then writing the code to get the syntax right for the markdown, and then update the number of simulations and knit the document once.
The paper should use rmarkdown for formatting, to ensure reproducibility, as mentioned in #16. It might seem pedantic to insist that you write the results here like this instead of copying them, but in trying to replicate this paper I found that I was getting different results to what you had specified. Which makes me concerned that the results perhaps are inaccurate.
In addition, in trying to run the results of this paper I encountered several errors from the author not updating the syntax in the paper to use the latest syntax from the changes made in the software.
The numbers in the text are specified by hand based on the above results. These results should be inserted using inline R syntax to ensure the right numbers are specified. Indeed, trying to write the syntax below to ensure the results were the same as what the author had written led to me being uncertain about where certain parts of these results were referred. For example, the authors state:
However the stopping probability is listed as:
Which is not 1.
If you use rmarkdown inline syntax, you could write:
It also feels strange to in one instance mention the percentage down to 1 decimal place, but then for the power, to state, "the overall power is around 90%". I suggest stating 91.3%