genome / bam-readcount

Count bases in BAM/CRAM files
MIT License
298 stars 95 forks source link

[JOSS REVIEW] docker documentation #79

Closed friedue closed 2 years ago

friedue commented 3 years ago

Related: https://github.com/openjournals/joss-reviews/issues/3722

Very much appreciate the docker image, but I feel the user-friendliness could be improved. I was trying to get the docker image to work with the test data sets provided in test-data/, which I feel is not such an usual use-case for peeps that may have never used bam-readcount before and just want to get a sense of what it does and whether the installation worked. So I'd appreciate it if you could either provide the code and/or update the dockerfile (e.g. to provide the test data within the container) so that any new user can immediately perform a test run of the tool.

Here's what I ended up doing:

## manually downloaded the BAM and FA files from test-data/ to my Downloads/ directory
## then ran docker with the -v option 
$ docker run -v /Downloads/test-data/:/Downloads/test-data/ mgibio/bam-readcount -f /Downloads/test-data/ref.fa /Downloads/test-data/test.bam > /Downloads/test-data/testing_bamreadcounts 

In addition to the test data, I recommend adding one or two more lines of example code for actually running the container -- took me a while to figure out how to get an interactive session running, which is my preferred way of using docker

$ docker pull mgibio/bam-readcount

# `run` will create, start and attach the container
$ docker run  mgibio/bam-readcount

# it may be more intuitive to run the container in interactive mode
# here we need to override the default entrypoint and declare the new entrypoint as bin/bash
$ docker run -it --entrypoint /bin/bash mgibio/bam-readcount
# the cmd above will create an interactive session, i.e. it will drop you inside  the container, allowing you to interact with it
# you can check that this happened via 
$ pwd
/opt/bam-readcount

## to run bam-readcount in the interactive session
$ /usr/bin/bam-readcount
Usage: bam-readcount [OPTIONS] <bam_file> [region]
Generate metrics for bam_file at single nucleotide positions.
Example: bam-readcount -f ref.fa some.bam

Available options:
  -h [ --help ]                         produce this message
  -v [ --version ]                      output the version number
  -q [ --min-mapping-quality ] arg (=0) minimum mapping quality of reads used 
                                        for counting.
  -b [ --min-base-quality ] arg (=0)    minimum base quality at a position to 
                                        use the read for counting.
  -d [ --max-count ] arg (=10000000)    max depth to avoid excessive memory 
                                        usage.
  -l [ --site-list ] arg                file containing a list of regions to 
                                        report readcounts within.
  -f [ --reference-fasta ] arg          reference sequence in the fasta format.
  -D [ --print-individual-mapq ] arg    report the mapping qualities as a comma
                                        separated list.
  -p [ --per-library ]                  report results by library.
  -w [ --max-warnings ] arg             maximum number of warnings of each type
                                        to emit. -1 gives an unlimited number.
  -i [ --insertion-centric ]            generate indel centric readcounts. 
                                        Reads containing insertions will not be
                                        included in per-base counts

Since you have a specific file dedicated to the docker documentation, I don't see a lot of harm in adding these details.

chrisamiller commented 2 years ago

Docker image is updated in https://hub.docker.com/r/mgibio/bam-readcount, and now contains some test data and usage examples that should make onboarding easier for folks

apldx commented 2 years ago

@friedue, thank you for your detailed suggestions. We use ENTRYPOINT because some tools rely on it, but I've switched the link to point to the dedicated https://github.com/genome/docker-bam-readcount repo and added some documentation there that covers a few use cases.

friedue commented 2 years ago

that looks great