Closed tavareshugo closed 6 years ago
Hey Hugo,
Thanks for the great feedback, I hope QTLseqr can be useful in your research.
This is also a good suggestion for improvement. I think it could improve the accuracy for QTLseq confidence interval calling. In fact the Yang2013 trial data has different bulk sizes and I recall thinking about this, but I was a little lazy in implementing since I felt that most users have equal bulk sizes... 😉
I like your solution and it seems like how I would have done it. If you would like to implement I would greatly appreciate it. I have unfortunately never yet had a collaboration on github, so I haven't actually ever accepted pulls requests etc. But it's about time I learned how to.
I can then work on the new documentation for the function and I would need to update the runQTLseqAnalysis
function documentation as well as the vignette.
In regard to your second question, no, it is just the number of diploid individuals. I can make that clearer in the the documentation. This is the way I interpreted Tagaki et al's simulation. Is there a reason that you can think of that it should be on based on haploid copies?
Thanks again for your input and possible collaboration. Ben
Hi Ben,
I have to confess I myself don't have extensive experience in collaborating with code. But, my advice here would be to maybe make a release of your current version (instructions here), because this is the published one and is working.
You might also want to consider making sure the users install the latest release version with devtools (see here).
Then, maybe create a development branch (e.g. called devel
) where one can be testing these other implementations without messing up the master branch.
Regarding the second question, I'll open a new issue for more discussion :smile:
Different bulkSizes have been added to QTLseqr v7.0.0
(before moving on to the issue, firstly thanks a lot for developing this very convenient package!)
In
runQTLseqAnalysis()
the optionbulkSize
only uses an integer of length 1. Therefore, if I understood the code rightly, the allele frequency simulation samples frequencies assuming both bulks have the same number of individuals.However, in practice the two bulks might not have the same number of individuals. Perhaps it should be considered that the user could provide either one or two values for
bulkSize
. Then, instead of only generating one vector of "null" allele frequencies (temp_freq
), two vectors could be generated, one for each pool?Something like this might work?:
Also, is it the case that this parameter should be the number of genome copies pooled, which for a diploid it would be 2x number of individuals pooled? If so, the documentation could be a bit more explicit about this, maybe?
I'm glad to fork and implement the suggestion above, but just wanted to check if it makes sense.