Organize documentation according to user stories

hackdna commented 4 years ago

This is partially based on the suggestion by @naumenko-sa. The main idea is to organize documentation around goals and workflows for particular use cases of different types of users.

Types of users:

Researcher (private bcbio installation on a local machine or HPC environment with a few reference data)
System administrator (shared bcbio installation in an HPC environment with most or all reference data)
Developer (minimal installation in a virtual machine running on a personal machine with no reference data by default)

User stories:

installation: quick start (VM and local machine) and production (HPC and cloud)
updating and maintenance: code and data
example analysis and project structure: small test run for bcbio RNA-seq(?)
small variants germline
SV/CNV for WGS
somatic variant calling
methylation
smallRNA
bulk RNA-seq
scRNAseq
chip-seq/ataq seq
teaching(?)

Also, some reference materials:

general troubleshooting
performance tuning
bcbio configuration parameters, inputs, internals, and outputs
application architecture, testing, development environment config, style guide for code and docs
CWL(?)

naumenko-sa commented 4 years ago

Thanks @hackdna for starting this discussion!

Here is a single cell RNA-seq user story, bringing all parts of it together from different documentation chapters: https://bcbio-nextgen.readthedocs.io/en/latest/contents/single_cell.html

In general, I'd propose to treat everything as a user story, i.e. switching from the structural description to the user experience. Issues would reference user stories, feed them and improve them. Another general point is to keep docs as succinct as possible. A user story could contain: workflow/parameters/output/validation/steps/references, but it might be flexible, as even analysis stories are very different.

In particular:

3 types of users. Bcbio is of no use without many packages (conda) and data, at least for one reference genome. I hope nobody is running bcbio analyses on a laptop (poor laptop). Any real NGS dataset requires at least a server. There is not much difference between one user of bcbio on a server and a shared installation on a cluster. Widely used references are just 3 - hg38, grch37, mm10. So the difference between 'Researcher' and 'System administrator' users is really blurred. Users who run bcbio in production at 100 samples a week have their own wrappers and infrastructure, integrated with LIMS and other databases - it is all out of bcbio's scope. Our typical user is Researcher/Sysadmin, a researcher-bioinformatician who is able to install and run bcbio on a Linux server. I'd propose focus on this imaginary user and avoid splitting our resources to maintain 2 user types. We can reflect this distinction in the Installation user story - Quick installation on top, and more details below, for 'Admin' user.

Developer user type: number wise the current ratio of Researcher/sysadmins : dev is 100:1. That makes us to push all dev staff in the 'Development' user story, as not relevant to 99% of ppl who readthedocs. 99% of our users should easily see what they need - how to run their analyses and access it in 1 click, being able to run the analysis themselves and not raising an issue. It might be not super convenient for a few developers. We can have any type of structure under 'Development' user story. Once there are 50:50 users:developers, we could lift it up.

User stories:

2.1 Quick start - quick installation + example analysis = WES NA12878 validation, just to give user a sense of what bcbio is 2.2 Installation (what is already there + experience accumulated in O2 installation, installation of code, of data, of conda packages, troubleshooting conda packages , updating and maintenance: code and data) 2.3 scRNAseq 2.4 Bulk RNA-seq (quick start, parameters, outputs, references) 2.5. small germline variant (quick start with trio, parameters, outputs, validations, references) 2.6. Structural variants (quick start, parameters, outputs, validations, references) 2.7 somatic variant calling 2.7.1 tumor only 2.7.2 tumor normal 2.7.3. UMIs 2.8 methylation 2.9. smallRNA 2.10. chip-seq/ataq seq 2.11 Development (installation on VM and local machine)

teaching(?) - I think we need to push teaching to respective user stories.
general troubleshooting - not sure, what it is?
bcbio configuration parameters, inputs, internals, and outputs - configuration, inputs, outputs should be in the respective user stories, they are different per user story.
application architecture, testing, development environment config, style guide for code and docs - to development user story
CWL(?) - a separate user story that we really need to push fast
Presentations - we could keep them and add new ones there.

2.1, 2.3 are already in place. I can quickly help to rearrange 2.2, 2.4, 2.5, 2.6, 2.7 - they are in my head, at least as a first approximation. Then we can get more input from many people on how to improve.

2.8 - @hackdna ? @jnhutchinson ? 2.10 - @sjhosui @yoonsquared ? 2.11 - @hackdna 2.9 -? our biggest gap currently is 7 - CWL. maybe someone could pick it up and push? @hackdna @matthdsm?

Everybody is welcome to chime in now, as this reorganization of the docs will be in place soon.

Sergey

matthdsm commented 4 years ago

Hi Sergey,

What exactly would you need for CWL? The current docs are still quite up to date, barring some deprecated features. I think @chapmanb could provide more insight in the current status of CWL, since he did most of the work. I've only tested the CWL approach for short variants in exomes (with limited success I might add), so I'm not really up to speed with where we are there.

Happy to help where I can, but I'm going to need some direction Cheers M

hackdna commented 4 years ago

Thanks for the comment @naumenko-sa. Looks like we are mostly on the same page. As you've mentioned, the main audience is the researchers who are mostly interested in running analyses. However, the user types or personas are not based just on job titles but on the goals that folks are trying to accomplish. There are some researchers who have to install bcbio on a cluster for their groups ("sys admins") while others submit PRs with bug fixes and new features ("devs").

Also, being able to run bcbio in Vagrant VM on a laptop with small input files is not only possible but essential to development because it allows to iterate quickly on the code using tools that are much more powerful and easy to use than CLI can provide.

Bcbio is a complex application with numerous moving parts and a steep learning curve. The main goal of the docs is to make getting started and learning bcbio as easy as possible. Even seasoned sys admins need help with deployment of bcbio which will drive adoption while being able to quickly understand bcbio software architecture and to have a nice development environment will encourage code contributions.

With these points in mind we can start with the template you've proposed and adapt by incorporating user feedback as needed.

bcbio / bcbio-nextgen

Organize documentation according to user stories #3142