Closed tinyheero closed 4 years ago
Hi Fong,
I actually have a script to do just that as part of hisatgenotype_toolkit. There is currently a bug with the file format for HISATgenotype if you're on linux so please follow the steps I have outlined:
Run the following replacing the cd command with the directory you have hisatgenotype in.
cd $DIR/$WITH/$HISATGENOTYPE/
for file in */*; do sed -i 's/\r$//' $file; done
Then you can run the following command:
hisatgenotype_toolkit conc-results --help
You should see the help screen for the conc-results script. This will format the output as a TSV or CSV.
Let me know if this works for you.
Thanks, Chris
Thanks Chris.
Are you referring to the directory with HISAT-genotype output or where it is installed? When I try to run:
hisatgenotype_toolkit conc-results --help
I get this error message:
: No such file or directory
I get this error with or without the sed
command you mentioned above (if you were referring to applying that to where it was installed). I have the following in my ~/.bashrc
:
export PATH=~/hisatgenotype:~/hisatgenotype/hisat2:$PATH
export PYTHONPATH=~/hisatgenotype/hisatgenotype_modules:${PYTHONPATH}
So it is finding the hisatgenotype executables fine.
Hi Fong,
Could you share the output of the following command:
cd ~/hisatgenotype
ls -alh
ls -alh hisatgenotype_tools/
I'd like to see what is going on with your tools directory as the toolkit script should be reading the options from there.
To your first question, where hisatgenotype is installed is where you should run that command. It looks like you are not having the carriage return error so it is something else.
Thanks, Chris
Hi Chris,
Here is the output:
$> ls -alh hisatgenotype_tools/
total 240K
drwxrwxr-x 2 f.chan f.chan 4.0K Jul 27 21:40 .
drwxrwxr-x 7 f.chan f.chan 4.0K Jul 27 21:14 ..
-rwxrwxr-x 1 f.chan f.chan 24K Jul 27 21:40 hisatgenotype_build_genome.py
-rwxrwxr-x 1 f.chan f.chan 13K Jul 27 21:40 hisatgenotype_call_variants.py
-rwxrwxr-x 1 f.chan f.chan 5.7K Jul 27 21:40 hisatgenotype_conc_results.py
-rwxrwxr-x 1 f.chan f.chan 29K Jul 27 21:40 hisatgenotype_convert_codis.py
-rwxrwxr-x 1 f.chan f.chan 6.8K Jul 27 21:40 hisatgenotype_extract_codis_data.py
-rwxrwxr-x 1 f.chan f.chan 42K Jul 27 21:40 hisatgenotype_extract_cyp_data.py
-rwxrwxr-x 1 f.chan f.chan 39K Jul 27 21:40 hisatgenotype_extract_RBG.py
-rwxrwxr-x 1 f.chan f.chan 5.1K Jul 27 21:40 hisatgenotype_extract_reads.py
-rwxrwxr-x 1 f.chan f.chan 4.2K Jul 27 21:40 hisatgenotype_extract_vars.py
-rwxrwxr-x 1 f.chan f.chan 20K Jul 27 21:40 hisatgenotype_legacy.py
-rwxrwxr-x 1 f.chan f.chan 7.6K Jul 27 21:40 hisatgenotype_locus.py
-rwxrwxr-x 1 f.chan f.chan 16K Jul 27 21:40 hisatgenotype_locus_samples.py
Hi Fong,
I'm not sure why the wrapper isn't reading the proper script. You can run the script directly from that folder though using the following code:
python ~/hisatgenotype/hisatgenotype_tools/hisatgenotype_conc_results.py --help
That should allow you to run the conc_results script directly without the wrapper looking for it. If you see the help screen from that then you can use the --in-dir option with the directory you have the results output of hisatgenotype. Let me know if this works for you.
Thanks, Chris
Thanks Chris. It worked. I actually ended up passing the wrapper as a script to python. So that might highlight where the error is:
$> hisatgenotype_toolkit
: No such file or directory
$> python ~/hisatgenotype/hisatgenotype_toolkit --help
usage: hisatgenotype_toolkit [-h] [--see-script]
{build-genome,call-variants,conc-results,convert-codis,extract-RBG,extract-codis-data,extract-cyp-data,extract-reads,extract-vars,legacy,locus,locus-samples}
...
HISAT-genotype Toolkit
optional arguments:
-h, --help show this help message and exit
--see-script Prints the exact script and options being run
Tools:
{build-genome,call-variants,conc-results,convert-codis,extract-RBG,extract-codis-data,extract-cyp-data,extract-reads,extract-vars,legacy,locus,locus-samples}
HISAT-genotype tools and individual scripts
build-genome hisatgenotype_build_genome.py
call-variants hisatgenotype_call_variants.py
conc-results hisatgenotype_conc_results.py
convert-codis hisatgenotype_convert_codis.py
extract-RBG hisatgenotype_extract_RBG.py
extract-codis-data hisatgenotype_extract_codis_data.py
extract-cyp-data hisatgenotype_extract_cyp_data.py
extract-reads hisatgenotype_extract_reads.py
extract-vars hisatgenotype_extract_vars.py
legacy hisatgenotype_legacy.py
locus hisatgenotype_locus.py
locus-samples hisatgenotype_locus_samples.py
$> python ~/hisatgenotype/hisatgenotype_toolkit conc-results --in-dir hisatgenotype_out
File: hisatgenotype_out/assembly_graph-hla-NA12892_extracted_1_fq_gz-hla-extracted-1_fq.report
Analysis - EM
Gene: A
A*02:01:01:01 (abundance: 51.95%)
Thanks. I guess it's currently hardcoded to HG_report_results.csv
(https://github.com/DaehwanKimLab/hisat-genotype/blob/master/hisatgenotype_tools/hisatgenotype_conc_results.py#L132). Are you open to pull requests to add a parameter to control that? I am planning to run this on numerous cases so it's best if they don't overwrite the same file :)
Hi Fong,
I'm glad it's working for you. Sure thing! You're free to submit a pull request and I'll integrate it in. I'm happy to have any assistance from users in adding in useful parameters. If you have any other suggestions for any script, feel free to submit them as a feature request or a pull request and I'll be sure to work on adding it into the code base.
If this solves your issue, I'll go ahead and close this issue.
Thanks, Chris
Hi there,
I've been able to get HISAT-genotype running on the tutorial data (https://daehwankimlab.github.io/hisat-genotype/tutorials/). The output report (
assembly_graph-hla-NA12892_extracted_1_fq_gz-hla-extracted-1_fq.report
) contains the results in a format that isn't readily parsed by a downstream program:I was just wondering if there was a way to get the results in a more standard format? For instance, in a TSV file?
Kind regards,
Fong