Closed natir closed 2 years ago
Quick response before working on changes on joss
branch:
fasta/q
out of the box. I saw you are working on this for rust-bio
for an upcoming release? Do you think it is worth including both?Filtlong
differences as wellThanks very much!
Quick response to some quick response:
1 -> We agree that all you need is a fast·a·q parser ? Needletail support fast·a·q out of box, with gzip, bzip, lzma compression or not, support multi-thread and it's faster than rust-bio. Why did you want keep rust-bio parser ? 5 -> Yes sure !
That was quick!
Mainly because I really like rust-bio, but I see your point :) I should contribute the filter function to rust-bio-tools instead
I like rust-bio too but for your needs isn't the crate you need
Indeed, needletail
is pretty amazing
@natir my apologies for the long pause, and thanks for your patience. Some uhm interesting times.
I have rewritten the code base and will go through your review after adding tests over the weekend. Thanks again for the suggestions, they helped improve nanoq
a lot.
In regards to your points @natir
rust-bio
parser in favor of needletail
parserneedletail
parser nowrbt sequence-stats
in benchmarks and explanation Nanoq
is really fast now (--fast
) especially when ignoring quality scores :tada:
Output is compliant between all tools, except that nanoq
returns an integer for average read length (which is the correct way to return an average of read length in my opinion, as there are no fractional nucleotide bases in reality)
Did you want me to check anything else regarding your last point @natir?
First thank you for many works you do, nanoq and paper seems many ways better than before.
Yes and I have some other comments, on paper.
In paper you show, nanoq is faster than any other to get stats and memory usage is reasonable and it's nice. But you didn't compare effect of your filter compare to other, just a plot or table to show read length and read quality after filter. A very interesting bonus, could be if we can see filtering improve down stream analysis, like assembly, species detection, variant calling (make your choice).
Another point, when I read section Applications I feel read a user manual, I think a scientific paper isn't the good place for a user manual. nanoq is able work on active nanopore run is very nice, but you can move this information in another section.
You produce a very expensive comparison of nanoq to other tools, and it's nice but a small text to resume this figure and table could be nice. Just few sentence to say nanoq is faster than any other tools and memory usage is low and constant. I also think it's not need to have all condition in main text, you can just let major result in main text and put other in appendix.
Another recommend change could be replace, long and not so easy to read table, by a scatter plot with computation and memory usage as axes. You lost speedup multiplication but you can add it in caption or in main text and you can put some conditions in the same figure.
I think with all this change paper could be very very nice, but if you @bovee or @luizirber think it's too much change I didn't want block publication.
Just to resume and follow decision on my point (If you wont fix a point just check it)
I also recently discover seqtk fqchk
have very similar functionally to nanoq stat
. seqtk is a very famous tools compare nanoq to this one could be nice but just cite it is ok too.
Appreciate the additional comments on this @natir! I would argue that showing filter results is a bit redundant (it's not very exciting to show the same outputs) and downstream improvement of other applications is not really necessary (and a bit too much for this paper).
Overall these are very basic operations done routinely on nanopore data (from my experience). Nanoq
is mostly trying to do this faster than other tools. I think a comparison with seqtk fqchk
would be great in that context.
Agree with the other points, not sure if scatter plot is the right way to show the results to be honest (the data types don't strike me as appropriate for scatter plots, as they generally show relationships between variables). Perhaps an annotated barplot would be better, and only the full dataset benchmark?
I will make tables and figures more legible, condense the benchmarks, and remove the subset benchmarks to the README
and generally brush this more into paper format.
For the filter effect it's ok for me.
About usage of scatter plot to show cpu and memory usage, I agree it's not a perfect fits, with the purpose of plot. But it's really interesting, because you can have different condition on same plot (with different marker), and it's easy to spot the best tools because it's in the left bottom corner. But it's your call.
Thanks for the patience @natir - I made the following changes to the paper:
seqtk fq
Ok I didn't write all of these in this way but it's your paper, information I think are important is present in paper.
Just a point, author of niffler are Luiz Iber and I.
Oh sorry for this! Let me add you in the Acknowledgements as well
Thanks excellent, fixed now. Will close comment but feel free to open or comment if there was anything else
This relates to my JOSS review openjournals/joss-reviews#2991
First thanks for your work on this tools and publications
Some comments about paper:
rust-bio-stats sequence-stats
to nanoq stats functionality is very similar