DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

Runner script suggestion #65

Closed nickp60 closed 5 years ago

nickp60 commented 6 years ago

Hi all, I want to suggest that a runner script could be added to blobtools. None of the inputs are necessarily something that users would have already prepared (metagenomic assembly, custom blast results, mapping file), and a script such as this one could help users go from reads and reference to plots with minimal head-scratching.

This blob_check.sh is meant to fill that gap (and its what I use when I need to check for contamination with blobtools). It first partitions the reads into three sets: the full set, those aligning to a reference, and those failing to align. For each of those read sets, it performs an assembly with metaSPAdes, blasts the results, and generates a blobplot. It takes 5 args:

The script checks to ensure that the user has provided all the args, has the required tools available in the PATH, and ensures that the BLASTDB variable has been set. A more flexible and robust script could be made in python or something, this is just my minimal implementation.

Thanks for your time, and for such a great tool!

DRL commented 6 years ago

Hi nickp60,

nice script, well done! However, I am kind of reluctant to merge this in as is...

I will try to explain my reasons below and hopefully we can find a solution that increases global net happiness...

What I would suggest is that you:

And I would be very happy to reference you on the BlobTools GitHub/Readme.io as an example of "Workflow B" (as in Laetsch & Blaxter, 2017). That way it would be clear that your workflow is for people that do similar things to what you do. And this way you would take care of future maintenance obligations.

Does that make sense? I don't want to come over as 'gate-keeping' the BlobTools repo (I actually want people to contribute more). It is just that I have thought about the whole pre-made workflow business in the past and decided that this will cause more harm than good.

But I am happy to discuss this is more detail if needed.

cheers,

dom

nickp60 commented 6 years ago

Hi dom,

Thanks for your reply! I am well aware that it is a very limited/naive use case, and I have also been bitten by the side affects of too-little headscratching. I think the idea of a separate repo that could perhaps be mentioned in the docs would be a good way forward.

When I was first trying to get up and running the Blobtools, my biggest hurdle was a the the documentation seemed to start partway through the process -- the user is expected to have a hits file. While it is great that the input is flexible enough to handle a hits file form many sources, it is a non-standard format, and it took me a while to go though the tutorial docs and the hits-file docs to figure out just how to get what I need from it. Perhaps the workflow in this script could be written up as a tutorial it would be better?

Now that I am more familiar with the whole workflow, I find myself needing to deviate little from one routine: compare blobs from the reads mapping to my reference to those not -- hence this script. It is a very limited use case, but perhaps with some tweaking, we could find a common ground between uber-specific and 100% modular.

What are the most common use cases that you hear about? I imagine it is probably all over the place, but are there any tends you are noticing things really picking up on?

Thanks again for your reply!

~Nick