dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
742 stars 295 forks source link

Start formalizing documentation / protocol standards #743

Open mr-c opened 9 years ago

mr-c commented 9 years ago

with examples in RST and MD

ctb commented 9 years ago

On Tue, Jan 27, 2015 at 10:14:19AM -0800, Michael R. Crusoe wrote:

with examples in RST and MD

Why are we doing protocols in MD? That is un-khmer-ly.

mr-c commented 9 years ago

RST alone is fine

On Tue Jan 27 2015 at 1:16:45 PM C. Titus Brown notifications@github.com wrote:

On Tue, Jan 27, 2015 at 10:14:19AM -0800, Michael R. Crusoe wrote:

with examples in RST and MD

Why are we doing protocols in MD? That is un-khmer-ly.

— Reply to this email directly or view it on GitHub https://github.com/ged-lab/khmer/issues/743#issuecomment-71698318.

ctb commented 9 years ago

A few things coming up in https://github.com/ged-lab/khmer-protocols/pull/148/ -- cc @drtamermansour.

One issue is our relationship to Amazon, vs Rackspace, vs MSU HPC. I think we have to pick a platform and I'm +1 on Amazon, and defaulting to designing everything around Ubuntu 14.04 running on Amazon. No objections to thinking about a templating system that would let us generalize our protocols across multiple systems, but would like to talk about it before implementing. Thoughts?

mr-c commented 9 years ago

+1 for Ubuntu 14.04; installs using Debian packages; data mounted/linked into homedir. Remaining software also installed to homedir without root.

This is the most portable way. We can provide instructions for Rackspace, et cetera, by swapping out the setup instructions. Also allows for running the protocols on Jenkins, as part of acceptance testing, and as part of Debian's autopkgtest efforts.

See my work so far with Eelpond: https://github.com/ged-lab/khmer-protocols/pull/147/files

On Mon Feb 09 2015 at 11:49:25 AM C. Titus Brown notifications@github.com wrote:

A few things coming up in ged-lab/khmer-protocols#148 https://github.com/ged-lab/khmer-protocols/pull/148 -- cc @drtamermansour https://github.com/drtamermansour.

One issue is our relationship to Amazon, vs Rackspace, vs MSU HPC. I think we have to pick a platform and I'm +1 on Amazon, and defaulting to designing everything around Ubuntu 14.04 running on Amazon. No objections to thinking about a templating system that would let us generalize our protocols across multiple systems, but would like to talk about it before implementing. Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/ged-lab/khmer/issues/743#issuecomment-73496539.

drtamermansour commented 9 years ago

There are 2 virtualization types: The old paravirtual (PV) type and the newer hardware virtual machine (HVM). According to Amazon ( http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html ), it is recommended to use HVM. However, the M1 instances that we used to use does not support Linux HVM AMIs.

Does't make difference to standardize the virtualization types?

Side question: Are there preferable instance types?

On Mon, Feb 9, 2015 at 6:57 AM, Michael R. Crusoe notifications@github.com wrote:

+1 for Ubuntu 14.04; installs using Debian packages; data mounted/linked into homedir. Remaining software also installed to homedir without root.

This is the most portable way. We can provide instructions for Rackspace, et cetera, by swapping out the setup instructions. Also allows for running the protocols on Jenkins, as part of acceptance testing, and as part of Debian's autopkgtest efforts.

See my work so far with Eelpond: https://github.com/ged-lab/khmer-protocols/pull/147/files

On Mon Feb 09 2015 at 11:49:25 AM C. Titus Brown <notifications@github.com

wrote:

A few things coming up in ged-lab/khmer-protocols#148 https://github.com/ged-lab/khmer-protocols/pull/148 -- cc @drtamermansour https://github.com/drtamermansour.

One issue is our relationship to Amazon, vs Rackspace, vs MSU HPC. I think we have to pick a platform and I'm +1 on Amazon, and defaulting to designing everything around Ubuntu 14.04 running on Amazon. No objections to thinking about a templating system that would let us generalize our protocols across multiple systems, but would like to talk about it before implementing. Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/ged-lab/khmer/issues/743#issuecomment-73496539.

— Reply to this email directly or view it on GitHub https://github.com/ged-lab/khmer/issues/743#issuecomment-73504088.

mr-c commented 9 years ago

@drtamermansour That's relatively easy to change later; I wouldn't worry about it for now.

When we are ready to do performance analysis we'll be able to come up with a good answer to this question.

ctb commented 9 years ago

A few things I noted --

My primary goal with the protocols is for them to be tutorials for doing practical data analysis. This may come in conflict with their use as acceptance tests; whenever this happens, we should choose the tutorial route, e.g. the less correct way that is easier to understand. (c.f. unnecessary use of multiple directories, unnecessary use of environment variables, extra jargon).

For the same reason, I've chosen not to be overly concerned with their cost to the user; we should choose a machine that is more expensive if it gets the job done with less latency. Obviously this involves judgement calls and tradeoffs.

More as I walk through the protocols :)

mr-c commented 9 years ago

Now that I have more experience with chroot's I'm fine with leaving in commands that run as root; that won't hamper the automated testing

Agreed that we need to favor understandability.

Here are changes I made in support of that:

Sounds like you're -1 on explicitly changing into the directory at the beginning of each command block. That's fine with me.

Likewise it sounds like you'd like to avoid environmental variables such as ${HOME} and ${THREADS}. I'm okay with taking those out as well.

I was playing around with the parallel command and left implementations using it as comments. Those can be deleted (I wasn't expecting my pull request to be merged yet).

Here are some writing/formatting guidelines that I implemented:

I also request the following stylistic change as it improves readability in vim's ReStructuredText mode:

Here's an example: https://github.com/ged-lab/khmer-protocols/pull/147/files#diff-888023f91d33aa4866c717764a7248a5L167

Other notes:

mr-c commented 8 years ago

@ctb https://github.com/dib-lab/khmer/issues/743#issuecomment-76225989 is relevant to our meeting today