DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

No genotype_genome_backbone.fa file found #15

Closed yubau1112 closed 3 years ago

yubau1112 commented 4 years ago

hisatgenotype --base genotype_genome -p 20 -1 HLA_H9-0193QC_S1_R1_001.fastq -2 HLA_H9-0193QC_S1_R2_001.fastq 1: Extracting reads from HLA_H9-0193QC_S1_R1_001.fastq No genotype_genome_backbone.fa file found Building genotype_genome Database

2020-07-14_hisat-genotype.log

> Base and Files:genotype_genome HLA_H9-0193QC_S1_R1_001.fastq-genotype_genome-extracted-1.fq.gz HLA_H9-0193QC_S1_R1_001.fastq-genotype_genome-extracted-2.fq.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/yubau/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/yubau/hisatgenotype/hisatgenotype_modules/hisatgenotype_typing_core.py", line 2330, in genotyping_locus
    verbose >= 1)
  File "/home/yubau/hisatgenotype/hisatgenotype_modules/hisatgenotype_typing_common.py", line 45, in wrapper
    func(*args, **kwargs)
  File "/home/yubau/hisatgenotype/hisatgenotype_modules/hisatgenotype_typing_common.py", line 519, in extract_database_if_not_exists
    verbose)
  File "/home/yubau/hisatgenotype/hisatgenotype_modules/hisatgenotype_typing_process.py", line 345, in extract_vars
    assert base_fname in unspliced_gene
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yubau/hisatgenotype/hisatgenotype", line 661, in typing_process
    ofnlog.write(str(x.get()))
  File "/home/yubau/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AssertionError

thanks!

chbe-helix commented 4 years ago

Hi Yubau,

Sorry for the late reply. If you are using the genotype genome then you shouldn't need to set the --base option. The --base option is only for when you want to restrict the analysis to a gene group, like HLA.

Let me know if this helps.

Thanks, Chris

yubau1112 commented 4 years ago

Hi, I am a clinical bioinformatics programmer from Taiwan. I used HLAscan, xHLA, HLA-HD, Optitype, HLA-LA to HLA typing. I just need HLA typing now. And I found HISAT-genotype recently.

I executed two commands (I don't know which one is correct):

python3.7 hisatgenotype_tools/hisatgenotype_build_genome.py -p 20 --base genotype_genome --locus-list genotype_genome
python3.7 hisatgenotype_tools/hisatgenotype_build_genome.py -p 20 --base genotype_genome --locus-list hla

And if the program says something not found, I just executed 'touch file' to make an empty file. I know this is a not good method, but it's can work!

following file list is my directory now:

-rw-rw-r--   1 yubau yubau  285 Jul 14 13:46 AUTHORS
drwxrwxr-x   2 yubau yubau 4.0K Jul 14 13:46 etc
-rw-rw-r--   1 yubau yubau 3.0G Jul 14 14:26 genome.fa
-rw-rw-r--   1 yubau yubau 6.3K Jul 14 14:27 genome.fa.fai
-rw-r--r--   1 yubau yubau 807M Jul 14 16:51 genotype_genome.1.ht2
-rwxr-xr-x   1 root  root  6.0G Jul 14 13:53 genotype_genome_20180128.tar.gz
-rw-r--r--   1 yubau yubau 703M Jul 14 16:51 genotype_genome.2.ht2
-rw-r--r--   1 yubau yubau  12K Jul 14 16:36 genotype_genome.3.ht2
-rw-r--r--   1 yubau yubau 703M Jul 14 16:36 genotype_genome.4.ht2
-rw-r--r--   1 yubau yubau 1.2G Jul 14 16:53 genotype_genome.5.ht2
-rw-r--r--   1 yubau yubau 716M Jul 14 16:53 genotype_genome.6.ht2
-rw-r--r--   1 yubau yubau   12 Jul 14 16:36 genotype_genome.7.ht2
-rw-r--r--   1 yubau yubau    8 Jul 14 16:36 genotype_genome.8.ht2
-rw-r--r--   1 yubau yubau    0 Jan 29  2018 genotype_genome.allele
-rw-rw-r--   1 yubau yubau    0 Jul 14 15:03 genotype_genome_backbone.fa
-rw-r--r--   1 yubau yubau    0 Jan 29  2018 genotype_genome.clnsig
-rw-r--r--   1 yubau yubau 3.9K Jul 14 16:35 genotype_genome.coord
-rw-r--r--   1 yubau yubau 3.0G Jul 14 16:35 genotype_genome.fa
-rw-r--r--   1 yubau yubau 6.3K Jul 14 16:35 genotype_genome.fa.fai
-rw-r--r--   1 yubau yubau    0 Jan 29  2018 genotype_genome.haplotype
-rw-r--r--   1 yubau yubau    0 Jul 14 16:10 genotype_genome.index.snp
-rw-r--r--   1 yubau yubau    0 Jan 29  2018 genotype_genome.link
-rw-r--r--   1 yubau yubau    0 Jul 14 16:10 genotype_genome.locus
-rw-r--r--   1 yubau yubau    0 Jan 29  2018 genotype_genome.partial
-rw-rw-r--   1 yubau yubau    0 Jul 14 15:07 genotype_genome_sequences.fa
-rw-r--r--   1 yubau yubau    0 Jul 14 16:10 genotype_genome.snp
drwxrwxr-x   9 yubau yubau 4.0K Jul 14 13:46 .git
-rw-rw-r--   1 yubau yubau   29 Jul 14 13:46 .gitattributes
-rw-rw-r--   1 yubau yubau  610 Jul 14 13:46 .gitignore
-rw-rw-r--   1 yubau yubau  120 Jul 14 13:46 .gitmodules
drwxr-xr-x   2 yubau yubau 4.0K Mar 17  2016 grch38
drwxrwxr-x  13 yubau yubau 8.0K Jul 14 14:24 hisat2
-rwxrwxr-x   1 yubau yubau  30K Jul 14 13:46 hisatgenotype
drwxrwxr-x   6 yubau yubau   65 Jul 14 14:13 hisatgenotype_db
drwxrwxr-x   3 yubau yubau 4.0K Jul 14 13:56 hisatgenotype_modules
drwxrwxr-x   2 yubau yubau 4.0K Jul 14 20:58 hisatgenotype_out
-rwxrwxr-x   1 yubau yubau 4.4K Jul 14 13:46 hisatgenotype_toolkit
drwxrwxr-x   3 yubau yubau 4.0K Jul 14 15:57 hisatgenotype_tools
-rw-rw-r--   1 yubau yubau 154K Jul 14 16:09 hla.allele
-rw-rw-r--   1 yubau yubau 185K Jul 14 16:09 hla_backbone.fa
-rw-rw-r--   1 yubau yubau  27M Jul 14 16:22 hla.graph.1.ht2
-rw-rw-r--   1 yubau yubau 9.2M Jul 14 16:22 hla.graph.2.ht2
-rw-rw-r--   1 yubau yubau  242 Jul 14 16:18 hla.graph.3.ht2
-rw-rw-r--   1 yubau yubau  46K Jul 14 16:18 hla.graph.4.ht2
-rw-rw-r--   1 yubau yubau 538K Jul 14 16:25 hla.graph.5.ht2
-rw-rw-r--   1 yubau yubau 143K Jul 14 16:25 hla.graph.6.ht2
-rw-rw-r--   1 yubau yubau 1.5M Jul 14 16:18 hla.graph.7.ht2
-rw-rw-r--   1 yubau yubau  77K Jul 14 16:18 hla.graph.8.ht2
-rw-rw-r--   1 yubau yubau 168M Jul 14 16:25 hla.graph.rf
-rw-r--r--   1 root  root  126M Jul 14 14:00 HLA_H9-0193QC_S1_R1_001.fastq
-rw-r--r--   1 root  root  126M Jul 14 14:00 HLA_H9-0193QC_S1_R2_001.fastq
-rw-rw-r--   1 yubau yubau 2.3M Jul 14 16:09 hla.haplotype
-rw-rw-r--   1 yubau yubau 368K Jul 14 16:09 hla.index.snp
-rw-rw-r--   1 yubau yubau 4.1M Jul 14 16:09 hla.link
-rw-rw-r--   1 yubau yubau 2.7K Jul 14 16:09 hla.locus
-rw-rw-r--   1 yubau yubau 135K Jul 14 16:09 hla.partial
-rw-rw-r--   1 yubau yubau  92M Jul 14 16:09 hla_sequences.fa
-rw-rw-r--   1 yubau yubau 493K Jul 14 16:09 hla.snp
-rw-rw-r--   1 yubau yubau 178K Jul 14 16:09 hla.snp.freq
-rw-rw-r--   1 yubau yubau   56 Jul 14 16:00 hla.version
-rwxr-xr-x   1 root  root  145M Jul 15 08:29 ILMN.tar.gz
-rw-rw-r--   1 yubau yubau  35K Jul 14 13:46 LICENSE
-rw-rw-r--   1 yubau yubau  857 Jul 14 13:46 README.md
-rw-rw-r--   1 yubau yubau 1.9K Jul 14 13:46 setup.sh
-rw-rw-r--   1 yubau yubau    7 Jul 14 13:46 VERSION

then, I executed: hisatgenotype --base hla -p 4 -1 HLA_H9-0193QC_S1_R1_001.fastq -2 HLA_H9-0193QC_S1_R2_001.fastq

I can get results!

# VERSIONS:
# HISAT2 - 2.2.0

# HISAT-genotype - 1.3.0

# Database - Database hla derived from HISATgenotype DB version: NONE
# COMMAND:
/home/yubau/hisatgenotype/hisatgenotype --base hla -p 20 -1 HLA_H9-0193QC_S1_R1_001.fastq -2 HLA_H9-0193QC_S1_R2_001.fastq
        A B C DMA DMB DOA DOB DPA1 DPB1 DPB2 DQA1 DQB1 DRA DRB1 E F G H HFE K L MICA MICB TAP1 TAP2 V

                hisat2 graph
                        7717 reads and 3929 pairs are aligned
                                1 A*02:07:01 (count: 3731)
                                2 A*02:01:01:01 (count: 3724)
                                3 A*02:06:01:01 (count: 3714)
                                4 A*02:01:111 (count: 3639)
                                5 A*02:93:02 (count: 3628)
                                6 A*02:549 (count: 3613)
                                7 A*02:01:120 (count: 3612)
                                8 A*02:01:124 (count: 3611)
                                9 A*02:648 (count: 3611)
                                10 A*02:01:01:07 (count: 3609)

                                1 ranked A*02:07:01 (abundance: 31.49%)
                                2 ranked A*02:01:01:01 (abundance: 23.99%)
                                3 ranked A*02:06:01:01 (abundance: 23.69%)
                                4 ranked A*02:474 (abundance: 20.82%)
                        11945 reads and 6081 pairs are aligned
                                1 B*46:01:01 (count: 3324)
                                2 B*51:01:01:01 (count: 3279)
                                3 B*51:02:01 (count: 3162)
                                4 B*46:66 (count: 3156)
                                5 B*51:56:03 (count: 3147)
                                6 B*51:13:01 (count: 3125)
                                7 B*15:01:01:01 (count: 3124)
                                8 B*15:11:01 (count: 3115)
                                9 B*15:15 (count: 3115)
                                10 B*15:24:01 (count: 3111)

                                1 ranked B*51:01:01:01 (abundance: 51.31%)
                                2 ranked B*46:01:01 (abundance: 48.69%)
                        14840 reads and 7533 pairs are aligned
                                1 C*01:02:01 (count: 5458)
                                2 C*14:02:01:01 (count: 5370)
                                3 C*01:106 (count: 5274)
                                4 C*01:110 (count: 5259)
                                5 C*01:03 (count: 5254)
                                6 C*01:02:36 (count: 5246)
                                7 C*01:93 (count: 5246)
                                8 C*01:67 (count: 5243)
                                9 C*01:02:29 (count: 5242)
                                10 C*01:08 (count: 5241)

                                1 ranked C*01:02:01 (abundance: 51.62%)
                                2 ranked C*14:02:01:01 (abundance: 48.38%)
                        89076 reads and 45254 pairs are aligned
                                1 DPB1*02:01:02 (count: 29671)
                                2 DPB1*02:02 (count: 29331)
                                3 DPB1*46:01:01 (count: 29267)
                                4 DPB1*16:01:01 (count: 28925)
                                5 DPB1*463:01 (count: 28794)
                                6 DPB1*04:02:01:02 (count: 28097)
                                7 DPB1*59:01 (count: 28068)
                                8 DPB1*135:01 (count: 27167)
                                9 DPB1*141:01 (count: 27134)
                                10 DPB1*02:01:04 (count: 27107)

                                1 ranked DPB1*05:01:01 (abundance: 55.31%)
                                2 ranked DPB1*02:01:02 (abundance: 44.69%)
                        39 reads and 20 pairs are aligned
                                1 DPB2*01:01:02 (count: 13)
                                2 DPB2*01:01:01 (count: 12)
                                3 DPB2*03:01:01:03 (count: 11)
                                4 DPB2*03:01:01:02 (count: 10)
                                5 DPB2*03:01:01:01 (count: 10)
                                6 DPB2*02:01 (count: 7)

                                1 ranked DPB2*01:01:01 (abundance: 20.00%)
                                2 ranked DPB2*01:01:02 (abundance: 20.00%)
                                3 ranked DPB2*02:01 (abundance: 20.00%)
                                4 ranked DPB2*03:01:01:03 (abundance: 20.00%)
                                5 ranked DPB2*03:01:01:02 (abundance: 10.00%)
                                6 ranked DPB2*03:01:01:01 (abundance: 10.00%)
                        76111 reads and 38681 pairs are aligned
                                1 DQB1*03:03:02:02 (count: 36687)
                                2 DQB1*03:195 (count: 35862)
                                3 DQB1*03:03:04 (count: 35829)
                                4 DQB1*03:03:02:01 (count: 35124)
                                5 DQB1*03:03:02:04 (count: 33529)
                                6 DQB1*03:03:02:03 (count: 33404)
                                7 DQB1*03:211 (count: 26815)
                                8 DQB1*03:02:01:01 (count: 26041)
                                9 DQB1*03:05:01 (count: 25491)
                                10 DQB1*03:02:01:02 (count: 25316)

                                1 ranked DQB1*03:03:02:02 (abundance: 100.00%)
                        54172 reads and 30671 pairs are aligned
                                1 DRB1*09:21 (count: 29335)
                                2 DRB1*07:01:01:01 (count: 16624)
                                3 DRB1*07:01:01:02 (count: 16543)
                                4 DRB1*07:01:01:03 (count: 16358)
                                5 DRB1*09:01:02 (count: 12732)
                                6 DRB1*07:01:18 (count: 12720)
                                7 DRB1*09:19 (count: 12644)
                                8 DRB1*09:01:10 (count: 12617)
                                9 DRB1*09:04 (count: 12607)
                                10 DRB1*09:18 (count: 12601)

                                1 ranked DRB1*09:01:02 (abundance: 75.19%)
                                2 ranked DRB1*07:01:18 (abundance: 24.81%)
                        2 reads and 1 pairs are aligned
                                1 H*01:01:01:01 (count: 1)
                                2 H*01:01:01:03 (count: 1)
                                3 H*01:02 (count: 1)
                                4 H*01:01:01:02 (count: 1)

                                1 ranked H*01:01:01:03 (abundance: 22.22%)
                                2 ranked H*01:02 (abundance: 11.11%)
                                3 ranked H*02:02 (abundance: 11.11%)
                                4 ranked H*02:03 (abundance: 11.11%)
                                5 ranked H*02:04 (abundance: 11.11%)
                                6 ranked H*02:05 (abundance: 11.11%)
                                7 ranked H*02:06 (abundance: 11.11%)
                                8 ranked H*03:01 (abundance: 11.11%)
                        315 reads and 161 pairs are aligned
                                1 K*01:02 (count: 161)
                                2 K*01:01:01:04 (count: 36)
                                3 K*01:01:01:01 (count: 22)
                                4 K*01:01:01:03 (count: 22)
                                5 K*01:03 (count: 14)
                                6 K*01:01:01:02 (count: 13)

                                1 ranked K*01:02 (abundance: 100.00%)

The result seems to be correct compare with other tools. I look forward to seeing version 1.3.1. Or can you give me an install command list to make a new directory to execute HISAT-genotype correctly?

thanks for your reply.

chbe-helix commented 4 years ago

Hi Yubau,

You can use the following commands to set things up in the future. You can find all of the information you need here: https://daehwankimlab.github.io/hisat-genotype/tutorials/ https://daehwankimlab.github.io/hisat-genotype/manual/

Please read the following code carefully and make changes where you see fit.

git clone --recurse-submodules https://github.com/DaehwanKimLab/hisat-genotype ~/hisatgenotype
cd ~/hisatgenotype/hisat2
make hisat2-align-s hisat2-build-s hisat2-inspect-s

# Note: Add these to your .bashrc file to make them permanent 
export PATH=~/hisatgenotype:~/hisatgenotype/hisat2:$PATH
export PYTHONPATH=~/hisatgenotype/hisatgenotype_modules:$PYTHONPATH

cd ~/PATH/TO/WORKING/DIRECTORY
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat-genotype/data/genotype_genome_20180128.tar.gz
tar xvzf genotype_genome_20180128.tar.gz

hisatgenotype --base hla -1 $FILE1 -2 $FILE2

Hope this helps! Thanks for using HISAT-genotype!

Thanks, Chris

yubau1112 commented 3 years ago

Hi,

HISAT-genotype web say: Adding HISAT-genotype to PATH

$ export PATH=~/hisatgenotype:~/hisatgenotype/hisat2:$PATH
$ export PYTHONPATH=~/hisatgenotype/hisatgenotype_modules:$PYTHONPATH

to make the binaries built above and other python scripts available everywhere, right?

So, I execute command in other directory hisatgenotype --base hla -1 $FILE1 -2 $FILE2

but it's say "Error: genotype_genome related files are missing!"

but I have! in ~/hisatgenotype directory.

So, I used "-x" like hisatgenotype -x ~/hisatgenotype/genotype_genome --base hla -1 $FILE1 -2 $FILE2

I will get error.

Can I execute: hisatgenotype --base hla -1 $FILE1 -2 $FILE2 in any directory and it's can get genotype_genome related files?

Or, I must download genotype_genome related files in each directory?

I know, I can execute commcand in ~/hisatgenotype directory and output to any directory, but this is so inconvenient.

thanks!!

yubau

chbe-helix commented 3 years ago

Hi Yubau,

With the current v1.3.0 yes you need genotype_genome in every directory you are running hisatgenotype. This is an oversight in how we designed hisatgenotype. The new version 1.3.1 fixes this. I am currently extensively testing this new version and hope to have it released in a week. Thanks for your patience while I add these new features!

Thanks, Chris

chbe-helix commented 3 years ago

Hi Yubau,

The new version of HISATgenotype (1.3.1) has been released and has a new option to direct HISATgenotype to an index folder. You should now only have to download the index once and only at install if you desire. The manual will be updated with these changes soon. Thanks!

Thanks, Chris

yubau1112 commented 3 years ago

Hi,

I am glad to hear this news! thanks!

yubau

yubau1112 commented 3 years ago

hi, I try

[clinical@i2 bin]$ git clone https://github.com/DaehwanKimLab/hisat-genotype.git ~/hisatgenotype
Cloning into '/home/clinical/hisatgenotype'...
remote: Enumerating objects: 450, done.
remote: Counting objects: 100% (450/450), done.
remote: Compressing objects: 100% (209/209), done.
remote: Total 7131 (delta 296), reused 353 (delta 231), pack-reused 6681
Receiving objects: 100% (7131/7131), 15.36 MiB | 3.23 MiB/s, done.
Resolving deltas: 100% (4582/4582), done.
[clinical@i2 bin]$ cd ~
[clinical@i2 ~]$ cd hisatgenotype
[clinical@i2 hisatgenotype]$ bash setup.sh -r
Setting up HISAT2
> No HISAT2 found on system
> Gathering Module
You need to run this command from the toplevel of the working tree.
You need to run this command from the toplevel of the working tree.
> Initiating Build
make: *** No targets specified and no makefile found.  Stop.
Adding to /home/clinical/.bashrc and sourcing
Cannot Build Indicies without HISAT2

what can I do?

thanks ! yubau

yubau1112 commented 3 years ago

git clone --recurse-submodules https://github.com/DaehwanKimLab/hisat-genotype ~/hisatgenotype cd ~hisatgenotype/hisat2 make cd ~hisatgenotype bash setup.sh -r

Can work ! thanks!