Closed shajoezhu closed 7 years ago
"So I'm generally positive about this paper. I don't have major concerns, but as it stands it is not very easy to read. It is laid out in the classic mathematical style, which is to say to get to the results the reader has to slog through a lot of complex descriptions of mcmc updates, which have not been given any context or intuition. The writing is not bad but the ms would benefit hugely from a) a reorganisation to hide the gore from an interested biological-minded reader, and b) some effort to explain the details in intuitive terms. Some specific suggestions are listed below."
I think he/she has a point here. A broad discussion of the algorithm, step by step, and moving the math into the supp. material would make the paper more appealing (and easy to read).
"More technically, I found the technical details to be slightly unsatisfactorally explored. Specific concerns were the arbitrary value of G=20 (page 4) which scales the recombination rate. This is pretty unconvincing. I agree that the model usually allows for some misspecification of the recombination rate but something much better could be done. Either do the right thing (inference of G by EM or analogously) or show that it is insensitive."
A fair point although my experience with Pf is that the painting model tends to be very robust unless extreme values of recombination are used (tried with a set of ranges, for instance, for the inbreeding analysis). We can rerun the model with different scaling factors and show this or go for the EM run, but I would avoid implementing anything new at this point.
"I also disliked the anecdotalaity of Figure 2 - I was not clear what the general takehome message was meant to be, and the plot with its many black bars is quite confusing."
We need a different representation for haplotypes, maybe just rendering differences.
Reviewer: 3
Comments to the Author Review of Zhi et al "Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data"
This paper describes how to infer the mixture decomposition of multiple strains of haploid organisms when multiple, related strains may be present in the same sample. This is an important problem in bacterial genetics, as argued by the authors, and they present a workable solution to this. The solution used, to use a copying model and perform markov-chain monte carlo analysis to extract out the appropriate details for the copying model, is an interesting novel application of these methods. To the best of my understanding it is correctly implemented and performs a useful job.
[x] So I'm generally positive about this paper. I don't have major concerns, but as it stands it is not very easy to read. It is laid out in the classic mathematical style, which is to say to get to the results the reader has to slog through a lot of complex descriptions of mcmc updates, which have not been given any context or intuition. The writing is not bad but the ms would benefit hugely from a) a reorganisation to hide the gore from an interested biological-minded reader, and b) some effort to explain the details in intuitive terms. Some specific suggestions are listed below.
[x] More technically, I found the technical details to be slightly unsatisfactorally explored. Specific concerns were the arbitrary value of G=20 (page 4) which scales the recombination rate. This is pretty unconvincing. I agree that the model usually allows for some misspecification of the recombination rate but something much better could be done. Either do the right thing (inference of G by EM or analogously) or show that it is insensitive.
[x] I also disliked the anecdotalaity of Figure 2 - I was not clear what the general takehome message was meant to be, and the plot with its many black bars is quite confusing.
Minor comments:
[x] Figure 3: c is a noisy plot. It would be much clearer if shown with a smoothing. It would inform the reader to say what the take home message of all plots should be in the legend.
[x] Page 2 right: what is c? it isn't defined? In general the model section needs some effort in clarification.
[x] Page 2: sp: inversley
[x] Page 3: titre: this is not a common term. What is wrong wit concentration? I think this is what you mean anyway? I find no evidence that titre has this meaning in statistics, only in chemistry, though I appreciate that there are many fields I'm not familiar with.
[x] Page 4: "Such erroneous markers are not currently inferred by DEploid, though this could be included in future versions." If it is easy, do it. If it is not easy, don't offer. In my experience very few pieces of academic software are maintained and developed in this way.