Parsing output into a CSV file

tinyheero commented 3 years ago

Hi @chbe-helix ,

I am trying out the latest version of HISAT-genotype (1.3.2) and have been able to run it. I've previously been able to convert the output into a format that could be readily used for downstream analysis (https://github.com/DaehwanKimLab/hisat-genotype/issues/16).

There used to be a hisatgenotype_tools/hisatgenotype_conc_results.py function that I could use, which seems to no longer exist. Based on the tutorial, it seems to be replaced with hisatgenotype_toolkit parse-results. I gave this a try:

hisatgenotype_toolkit parse-results --in-dir output/hisatgenotype_out --csv

And I am getting this output:

Analysis - EM
  Gene: A
    A*03:01:01:01 (abundance: 37.61%)
    A*33:01:01 (abundance: 37.03%)
    A*33:03:23 (abundance: 14.53%)
    A*03:21N (abundance: 10.84%)
  Gene: B
    B*55:01:01 (abundance: 50.23%)
    B*14:02:01:03 (abundance: 27.82%)
    B*14:02:01:01 (abundance: 21.95%)
  Gene: C
    C*08:02:01:01 (abundance: 46.81%)
    C*07:02:01:03 (abundance: 26.60%)
    C*07:02:01:09 (abundance: 26.60%)
  Gene: DPA1
    DPA1*01:03:01:01 (abundance: 100.00%)
  Gene: DPB1
    DPB1*03:01:01 (abundance: 31.67%)
    DPB1*350:01 (abundance: 24.52%)
    DPB1*124:01 (abundance: 23.48%)
    DPB1*04:01:01:02 (abundance: 20.33%)
  Gene: DRB1
    DRB1*04:07:01 (abundance: 52.68%)
    DRB1*01:02:01 (abundance: 47.32%)
  Gene: DQA1
    DQA1*01:01:02 (abundance: 56.63%)
    DQA1*03:03:01:01 (abundance: 43.37%)
  Gene: DQB1
    DQB1*05:01:01:01 (abundance: 25.01%)
    DQB1*05:01:01:03 (abundance: 25.01%)
    DQB1*03:01:01:01 (abundance: 24.99%)
    DQB1*03:01:01:03 (abundance: 24.99%)
Analysis - Allele splitting
  Gene: A (score: 1.00)
    A*33 - Trimmed (score: 0.5156)
    A*03 - Trimmed (score: 0.4845)
    A*03:01:01:01 (score: 0.3761)
    A*33:01:01 (score: 0.3703)

This is quite different than what the previous output was. In particular, even though the --help says:

$> hisatgenotype_toolkit parse-results --help
usage: hisatgenotype_parse_results.py [-h] [--in-dir READ_DIR] [-t TRIM_LEVEL] [--csv] [--output-file OFILE]

Script for simplifying HISAT-genotype results

optional arguments:
  -h, --help            show this help message and exit
  --in-dir READ_DIR     Input directory (e.g. read_input) (default: (empty))
  -t TRIM_LEVEL, --trim TRIM_LEVEL
                        Trim allele to specific field level (example : A*01:01:01:01 trim 2 A*01:01)
  --csv                 Save Results as CSV dataframe
  --output-file OFILE   Path to the output CSV file

Using --csv doesn't seem to produce a CSV file. Is there a way to produce the previous output discussed here (https://github.com/DaehwanKimLab/hisat-genotype/issues/19) or some tabular form to use?

tinyheero commented 3 years ago

Ah. I just realized that specifying --csv causes the output to be written to HG_report_results.csv and can be controlled by the --output-file argument. The output to standard out confused me as I was expected it to be the CSV output.

chbe-helix commented 3 years ago

Hi Fong,

Glad you found the solution. Sorry for the confusion.

Thanks, Chris

DaehwanKimLab / hisat-genotype

Parsing output into a CSV file #32