ewbell94 / PEPPI

Pipeline for protein-protein interaction prediction
GNU General Public License v3.0
15 stars 7 forks source link

missing SPRINGDB/70negpos.db_cs219.ffdata file #3

Open frihaka opened 1 year ago

frihaka commented 1 year ago

Hi,

I am trying to run PEPPI on a local machine (linux), using 3 protein sequences as a test run.

I have installed/compiled from source psipred (psipred.4.02.tar.gz) and blast (blast-2.2.9-amd64-linux.tar.gz) as recommended. I have installed/compiled from source latest hh-suite (https://github.com/soedinglab/hh-suite) and its dependencies, including its latest pdb70 database (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).

I have downloaded and compiled PEPPI as in the install.sh, with the following config:

Are you on a slurm HPC system? (WARNING: PEPPI will run slowly without HPC parallelization) [y/n] n

Full path of where you wish to install PEPPI: /home/hae/anaconda3/envs/peppi

Full path to your HHsuite installation: /home/hae/bin/HHSuite3/hh-suite/build

Full path to the database used for hhblits: /home/hae/bin/HHSuite3/pdbDB/pdb70

Full path of your python interpreter: /home/hae/anaconda3/envs/peppi/bin/python

What is your C++ compiler? g++

What is your fortran compiler? gfortran

The working directory - where the main pipeline script is launched - has the following tree:

├── A.fasta
├── B.fasta
├── LICENSE
├── PEPPI1.pl
├── PEPPIcontainer
│   ├── PEPPIconda.yml
│   └── PEPPIcontainer.def
├── README.md
├── bin
│   ├── CTNN
│   ├── CTmod
│   ├── CTpred.py
│   ├── NWalign
│   ├── PEPPI2temp.pl
│   ├── PEPPI3temp.py
│   ├── PRISMmod
│   ├── SEQmod
│   ├── SPRINGNEGmod
│   ├── SPRINGmod
│   ├── STRINGmod
│   ├── TMalign
│   ├── blastp
│   ├── charge_inp.dat
│   ├── compileRes.sh
│   ├── compiled_source
│   ├── dcomplex
│   ├── dimMap
│   ├── fort.21_alla
│   ├── getHashcode.py
│   ├── install.sh
│   ├── makeHHR.pl
│   ├── model_multiD
│   ├── multiwrapper.pl
│   ├── oldcomplex
│   ├── runSetWrapper.pl
│   ├── seqSearch.pl
│   ├── trainCT.py
│   └── trainDists.py
├── cmd.sh
├── install.sh
├── lib
│   ├── CTtrainvecs.txt
│   ├── SEQ
│   ├── SPRINGDB
│   ├── STRING
│   └── trainNB.txt
└── test
    ├── A.fasta
    ├── B.fasta
    ├── LR.csv
    ├── PEPPI2.pl
    ├── PEPPI3.py
    ├── PPI
    ├── allres.txt
    ├── mono
    ├── protcodeA.csv
    └── protcodeB.csv

There is no SPRINGDB/70negpos.db_cs219.ffdata file to be found from the original install/download of PEPPI:

├── 70CDHITstruct.txt
├── 70negpos.db
├── 70negpos.mono
├── 70negs.txt
├── monomers
└── monomers.aliases

I have checked that the scripts bin/makeHHR.pl and bin/seqSearch.pl had the correct local paths.

When I launch the main script:

PEPPI1.pl

the pipeline seems to run fine at the beginning (hh-suite functions kicking in as expected), with the output directory and its content as this:

├── PEPPI
│   ├── A.fasta
│   ├── B.fasta
│   ├── PEPPI2.pl
│   ├── mono
│   ├── protcodeA.csv
│   └── protcodeB.csv

But the pipeline fails to find "SPRINGDB/70negpos.db_cs219.ffdata", thus failing to output final results file:

prot1
HHR
- 12:32:53.429 INFO: Search results will be written to /tmp/hae/makeHHR_prot1_464127/prot1.hhr

- 12:32:53.456 INFO: Searching 92111 column state sequences.

- 12:32:53.501 INFO: /tmp/hae/makeHHR_prot1_464127/prot1.fasta is in A2M, A3M or FASTA format

- 12:32:53.501 INFO: Iteration 1

- 12:32:53.666 INFO: Prefiltering database

- 12:32:54.078 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 4748

- 12:32:54.126 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 2755

- 12:32:54.126 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2755

- 12:32:54.126 INFO: Scoring 2755 HMMs using HMM-HMM Viterbi alignment

- 12:32:54.230 INFO: Alternative alignment: 0

- 12:32:55.312 INFO: 2000 alignments done

- 12:32:55.831 INFO: 2755 alignments done

- 12:32:55.833 INFO: Alternative alignment: 1

- 12:32:57.405 INFO: 2650 alignments done

- 12:32:57.410 INFO: Alternative alignment: 2

- 12:32:57.914 INFO: 467 alignments done

- 12:32:57.914 INFO: Alternative alignment: 3

- 12:32:58.132 INFO: 76 alignments done

- 12:32:58.924 INFO: Premerge done

- 12:32:58.924 INFO: Realigning 500 HMM-HMM alignments using Maximum Accuracy algorithm

- 12:34:56.226 INFO: 1284 sequences belonging to 1284 database HMMs found with an E-value < 0.001

- 12:34:56.226 INFO: Number of effective sequences of resulting query HMM: Neff = 11.2888

- 12:34:56.239 INFO: Iteration 2

- 12:34:56.239 INFO: Set premerge to 0! (premerge: 3 iteration: 2 hits.Size: 1281)

- 12:34:56.407 INFO: Prefiltering database

- 12:34:56.820 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 4871

- 12:34:56.863 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 2556

- 12:34:56.863 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 1389

- 12:34:56.863 INFO: Scoring 1389 HMMs using HMM-HMM Viterbi alignment

- 12:34:56.950 INFO: Alternative alignment: 0

- 12:34:57.865 INFO: 1389 alignments done

- 12:34:57.868 INFO: Alternative alignment: 1

- 12:34:58.706 INFO: 1228 alignments done

- 12:34:58.708 INFO: Alternative alignment: 2

- 12:34:58.929 INFO: 79 alignments done

- 12:34:58.930 INFO: Alternative alignment: 3

- 12:34:59.108 INFO: 31 alignments done

- 12:34:59.156 INFO: Rescoring previously found HMMs with Viterbi algorithm

- 12:34:59.233 INFO: Alternative alignment: 0

- 12:34:59.744 INFO: 1167 alignments done

- 12:34:59.747 INFO: Alternative alignment: 1

- 12:35:00.282 INFO: 1167 alignments done

- 12:35:00.285 INFO: Alternative alignment: 2

- 12:35:00.456 INFO: 196 alignments done

- 12:35:00.456 INFO: Alternative alignment: 3

- 12:35:00.500 INFO: 32 alignments done

- 12:35:00.571 INFO: Realigning 500 HMM-HMM alignments using Maximum Accuracy algorithm

- 12:35:02.405 INFO: 1284 sequences belonging to 1284 database HMMs found with an E-value < 0.001

- 12:35:02.405 INFO: Number of effective sequences of resulting query HMM: Neff = 11.2888

$ cp /tmp/hae/makeHHR_prot1_464127/prot1.a3m /tmp/2YVboks0ez/9UFyIA7YHI.1.in.a3m
Filtering alignment to diversity 7 ...
$ hhfilter -v 1 -neff 7 -i /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m -o /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m
$ /home/hae/bin/HHSuite3/hh-suite/build/scripts/reformat.pl -v 1 -r -noss a3m psi /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m /tmp/2YVboks0ez/9UFyIA7YHI.in.psi
Predicting secondary structure with PSIPRED ... $ /home/hae/bin/HHSuite3/BLAST//blastpgp -b 1 -j 1 -h 0.001 -d /home/hae/bin/HHSuite3/hh-suite/build/data/do_not_delete -i /tmp/2YVboks0ez/9UFyIA7YHI.sq -B /tmp/2YVboks0ez/9UFyIA7YHI.in.psi -C /tmp/2YVboks0ez/9UFyIA7YHI.chk 1> /tmp/2YVboks0ez/9UFyIA7YHI.blalog 2> /tmp/2YVboks0ez/9UFyIA7YHI.blalog
$ echo 9UFyIA7YHI.chk > /tmp/2YVboks0ez/9UFyIA7YHI.pn

$ echo 9UFyIA7YHI.sq  > /tmp/2YVboks0ez/9UFyIA7YHI.sn

$ /home/hae/bin/HHSuite3/BLAST//makemat -P /tmp/2YVboks0ez/9UFyIA7YHI
$ /home/hae/bin/HHSuite3/PSIPRED/psipred/bin/psipred /tmp/2YVboks0ez/9UFyIA7YHI.mtx /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat2 /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat3 > /tmp/2YVboks0ez/9UFyIA7YHI.ss
$ /home/hae/bin/HHSuite3/PSIPRED/psipred/bin/psipass2 /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights_p2.dat 1 0.98 1.09 /tmp/2YVboks0ez/9UFyIA7YHI.ss2 /tmp/2YVboks0ez/9UFyIA7YHI.ss > /tmp/2YVboks0ez/9UFyIA7YHI.horiz
done 
- 12:35:03.826 INFO: /tmp/hae/makeHHR_prot1_464127/prot1.a3m is in A2M, A3M or FASTA format

- 12:35:03.847 WARNING: MSA prot1 looks too diverse (Neff=12.227>11). Better check it with an alignment viewer for non-homologous segments. Also consider building the MSA with hhblits using the - option to limit MSA diversity.

- 12:35:03.853 INFO: Search results will be written to /tmp/hae/makeHHR_prot1_464127/prot1.hhr

- 12:35:03.853 ERROR: In /home/hae/bin/HHSuite3/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:

- 12:35:03.853 ERROR:   could not open file '/home/hae/anaconda3/envs/peppi/PEPPI/lib/SPRINGDB/70negpos.db_cs219.ffdata'

benchmark: 0
Target: prot1
Query         prot1
Match_columns 234
No_of_seqs    552 out of 4235
Neff          11.2888
Searched_HMMs 2908
Date          Thu Jun 29 12:35:02 2023
Command       /home/hae/bin/HHSuite3/hh-suite/build/bin/hhblits -i /tmp/hae/makeHHR_prot1_464127/prot1.fasta -oa3m /tmp/hae/makeHHR_prot1_464127/prot1.a3m -d /home/hae/bin/HHSuite3/pdbDB/pdb70 -n 2 -e 0.001

How to I obtain the SPRINGDB/70negpos.db_cs219.ffdata file ?

Thanks for your help in advance!

lannal8 commented 1 day ago

@frihaka Did you finally solve it? I have this same issue.