The --help section needs to be updated

Leonievb commented 2 years ago

Many of the argument descriptions are not informative enough when calling the --help flag. Often an input format is missing

marcelm commented 2 years ago

I have restructured the --help output a bit. The --highlight input format now has a description. As far as I can tell, the only other description that is missing is the one for --filter-cellids. I’ve opened #16 for that.

We can surely expand the help texts a little bit further, but we should not turn --help into full-blown documentation. IMO, the actual explanation (if necessary) should be in the documentation. I would consider the --help text more as a reminder of which options exist.

Here is how it looks at the moment (the "Run on 10X data" help string still needs to be improved.):

usage: trex run10x [-h] [--version] [--debug] [--genome-name NAME]
                   [--chromosome CHROMOSOME] [--start INT] [--end INT]
                   [--amplicon DIRECTORY [DIRECTORY ...]] [--samples SAMPLES] [--prefix]
                   [--min-length INT] [--max-hamming INT] [--jaccard-threshold VALUE]
                   [--filter-cellids CSV] [--keep-single-reads] [--visium]
                   [--output DIRECTORY] [--delete] [-l] [--umi-matrix] [--plot]
                   [--highlight FILE]
                   DIRECTORY [DIRECTORY ...]

Run on 10X data

positional arguments:
  DIRECTORY             Path to the input Cell Ranger directory. There must be an 'outs'
                        subdirectory in that directory.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --debug               Print some extra debugging messages

Input:
  --genome-name NAME    Name of the genome as indicated in 'cellranger count' run with
                        the flag --genome. Default: Auto-detected
  --chromosome CHROMOSOME, --chr CHROMOSOME
                        Name of chromosome on which clone ID is located. Default: Last
                        chromosome in BAM file
  --start INT, -s INT   Position of first clone ID nucleotide (1-based). Default: Auto-
                        detected
  --end INT, -e INT     Position of last clone ID nucleotide (1-based). Default: Auto-
                        detected
  --amplicon DIRECTORY [DIRECTORY ...], -a DIRECTORY [DIRECTORY ...]
                        Path to Cell Ranger result directory (a subdirectory 'outs' must
                        exist) containing sequencing of the clone ID amplicon library.
                        Provide these in the same order as transcriptome datasets
  --samples SAMPLES     Sample names separated by comma, in the same order as Cell
                        Ranger directories
  --prefix              Add sample name as prefix to cell IDs. Default: Add as suffix

Filter settings:
  --min-length INT, -m INT
                        Minimum number of nucleotides a clone ID must have. Default: 20
  --max-hamming INT     Maximum hamming distance allowed for two clone IDs to be called
                        similar. Default: 5
  --jaccard-threshold VALUE
                        If the Jaccard index between clone IDs of two cells is higher
                        than VALUE, they are considered similar. Default: 0
  --filter-cellids CSV, -f CSV
                        CSV file containing cell IDs to keep in the analysis. This flag
                        enables to remove cells e.g. doublets
  --keep-single-reads   Keep clone IDs supported by only a single read. Default: Discard
                        them
  --visium              Adjust filter settings for 10x Visium data: Filter out clone IDs
                        only based on one read, but keep those with only one UMI
  --output DIRECTORY, -o DIRECTORY, --name DIRECTORY, -n DIRECTORY
                        Name of the run directory to be created by the program. Default:
                        trex_run
  --delete              Delete the run directory if it already exists

Optional output files:
  Use these options to enable creation of additional files in the output directory

  -l, --loom            Create also a loom-file from Cell Ranger and clone data. File
                        will have the same name as the run. Default: do not create a
                        loom file
  --umi-matrix          Create a UMI count matrix 'umi_count_matrix.csv' with cells as
                        columns and clone IDs as rows
  --plot                Plot the clone graph
  --highlight FILE      Highlight cell IDs listed in FILE (text file with one cell ID
                        per line) in the clone graph

Leonievb commented 2 years ago

I agree with you. Except for the -f flag, everything is sufficiently explained for the --help section

marcelm commented 2 years ago

Great, I’ll close this issue then.

frisen-lab / TREX

The --help section needs to be updated #9