Open sebel76 opened 3 years ago
Hi @sebel76,
This looks like a problem with your PSIPRED installation (used for secondary structure prediction before the actual folding). I suggest to set up PSIPRED first and make sure it runs, before using it in EVcouplings.
Are those two files made, and if so, do they have any content?
/Users/sbelanger/Documents/research/orthology/prot.coevolution/dcl5/fold/psipred/dcl5.ss2, /Users/sbelanger/Documents/research/orthology/prot.coevolution/dcl5/fold/psipred/dcl5.horiz
Hi,
I install PSIPRED following instructions from the developer. I compiled the program. I use the program runpsipredplus (because of my blast+). Using EVcoupling, I observed that I add to put the blast index directly to the working directory or it return a message error.
Then, I run it with EVcoupling and got the message that I sent you. I re-run the example coming with PSIPRED. The program run. Everything looks okay with the BLAST part, but it does not return the two files (.ss2 and .horiz).
I got the following error in the terminal: Running PSI-BLAST with sequence example/example.fasta ... Predicting secondary structure... ../bin/chkparse: Command not found. FATAL: Error whilst running chkparse - script terminated!
This is weird since the ../bin/chkparse program is present and run well...
Any idea to solve my issue?
Best, Sébastien
Hi developer,
Thanks you for your program. It is quite easy to use and generate valuable results.
I want to get help with the de novo protein structure prediction using PSIPRED. I always got the same error saying that PSIPRED cannot find the output folder for .ss2 and .horiz files.
Why it is happen and how to solve it?
Below, I attached the .config and .fail file if it can help.
I observed two things:
I have a bulk of other questions:
Thanks for you help! Sébastien
#############
dcl5.failed
############# Traceback (most recent call last): File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/utils/pipeline.py", line 508, in execute_wrapped outcfg = execute(config) File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/utils/pipeline.py", line 187, in execute outcfg = runner(incfg) File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/fold/protocol.py", line 714, in run return PROTOCOLSkwargs["protocol"] File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/fold/protocol.py", line 357, in standard residues = secondary_structure(**kwargs) File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/fold/protocol.py", line 116, in secondary_structure ss2_file, horiz_file = run_psipred( File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/fold/tools.py", line 232, in run_psipred verify_resources( File "/Users/sbelanger/opt/miniconda3/lib/python3.8/site-packages/evcouplings/utils/system.py", line 129, in verify_resources raise ResourceError( evcouplings.utils.system.ResourceError: psipred output is invalid: /Users/sbelanger/Documents/research/orthology/prot.coevolution/dcl5/fold/psipred/dcl5.ss2, /Users/sbelanger/Documents/research/orthology/prot.coevolution/dcl5/fold/psipred/dcl5.horiz
#################
dcl5_config.txt
#################
Sample configuration file for evcouplings monomer protein prediction pipeline.
This file determines all aspects of the computation:
- which compute environment to use
- which stages of the pipeline to run
- what the settings for each of the stages are
Minimal settings required before this configuration can be executed:
- set your environment, paths to tools and databases (at the end of this file)
- under "global", set prefix and sequence_id
- run it! :)
Configuration rules:
1) Global settings override settings for stages
2) Outputs of a stage are merged into "global" and fed into the input of subsequent stages
(e.g., the alignment_file output of align will be used by the alignment_file input of couplings)
3) All settings are explicitly specified here. No hidden defaults in code.
4) Each stage is also passed the parameters in the "databases" and "tools" sections
pipeline: protein_monomer
which stages of workflow to run. Uncomment downstream stages using # (however, no stage can be run before the previous
stage has been run)
stages:
Global job settings (which protein, region). These will override settings of the same name in each of the stages.
These are typically the settings you want to modify for each of your jobs, together with some settings in the align stage.
global:
mandatory output prefix of the job (e.g. output/HRAS will store outputs in folder "output", using files prefixed with "HRAS")
prefix: /Users/sbelanger/Documents/research/orthology/prot.coevolution/dcl5
sequence_id: DCL5.aa
sequence_file: /Users/sbelanger/Documents/research/orthology/prot.coevolution/seq/DCL5.aa.fasta
region:
theta: 0.8
cpu: 10
Specify multiple batch jobs (if empty, only a single job will be run). Each entry (e.g. b_0.75) will be appended to
global.prefix to uniquely identify the subjob. Parameters for individual stages that should be overridden for each
subjob have to be specified, for all other parameters jobs share the same values.
batch:
_b0.75:
align: {domain_threshold: 0.75, sequence_threshold: 0.75}
_b0.3:
align: {domain_threshold: 0.3, sequence_threshold: 0.3}
Sequence alignment generation/processing.
align:
standard: iterative sequence search and postprocessing using jackhmmer.
protocol: standard
first_index: 1
use_bitscores: true
domain_threshold: 0.5 sequence_threshold: 0.5
iterations: 5
database: uniref100
compute_num_effective_seqs: false
seqid_filter:
minimum_sequence_coverage: 50
minimum_column_coverage: 70
extract_annotation: true cpu:
nobias: false
reuse_alignment: true
checkpoints_hmm: false checkpoints_ali: false
Alternative protocol: reuse existing alignment and apply postprocessing to generate alignment that is consistent
with pipeline requirements. Uncomment, and comment all values in align section above to enable the "existing" protocol
protocol: existing
prefix:
Path of input alignment. Alignment needs to contain region in form SEQID/start-end, or first_index must be set
input_alignment:
sequence_id:
first_index:
compute_num_effective_seqs: False
theta:
seqid_filter:
minimum_sequence_coverage: 50
minimum_column_coverage: 70
extract_annotation: True
Inference of evolutionary couplings from sequence alignment
couplings:
current options:
protocol: standard
iterations: 100
alphabet:
ignore_gaps: true
lambda_J: 0.01
lambda_J_times_Lq: true
lambda_h: 0.01 lambda_group: scale_clusters:
reuse ECs and model parameters, if this stage has been run before
reuse_ecs: true
min_sequence_distance: 6
scoring_model: logistic_regression
Alternative protocol: compute couplings with mean field direct coupling analysis
Uncomment, and comment all values in align section above to enable the mean_field protocol
Compare ECs to known 3D structures
compare:
Current options: standard
protocol: standard
prefix: sequence_id: ec_file: target_sequence_file:
by_alignment: true
pdb_ids:
max_num_structures: 10 max_num_hits: 25
compare_multimer: true
sequence_file: first_index: region: alignment_min_overlap: 20 use_bitscores: true domain_threshold: 0.1 sequence_threshold: 0.1
atom_filter:
distance_cutoff: 5
min_sequence_distance: 6
plot_probability_cutoffs: [0.90, 0.99]
plot_lowest_count: 0.05 plot_highest_count: 1.0 plot_increase: 0.05
boundaries: union
scale_sizes: true
draw_secondary_structure: true
draw_coverage: true
print_pdb_information: true
pdb_alignment_method: jackhmmer
Settings for Mutation effect predictions
mutate:
Options: standard
protocol: standard
mutation_dataset_file:
Settings for 3D structure prediction
fold:
Options: standard
protocol: standard
engine: cns_dgsa
folding_config_file:
cut_to_alignment_region: true
sec_struct_method: psipred
reuse_sec_struct: true
sec_struct_file:
filter_sec_struct_clashes: true
min_sequence_distance: 6
fold_probability_cutoffs: [0.90, 0.99]
fold_lowest_count: 0.5 fold_highest_count: 1.3 fold_increase: 0.05
num_models: 10
cleanup: true
These settings allow job status tracking using a database, and result collection in an archive
management:
URI of database
database_uri:
job_name:
archive: [target_sequence_file, statistics_file, alignment_file, frequencies_file, ec_file, ec_longrange_file, model_file, enrichment_file, evzoom_file, enrichment_pml_files, ec_lines_pml_file, contact_map_files, ec_compared_all_file, ec_compared_longrange_file, remapped_pdb_files, mutations_epistatic_pml_files, mutation_matrix_file, mutation_matrix_plot_files, secondary_structure_pml_file, folding_ec_file, folded_structure_files, folding_ranking_file, folding_comparison_file, folding_individual_comparison_files, ec_lines_compared_pml_file, pdb_structure_hits_file, sec_struct_file]
Computational environment for batch jobs (using evcouplings command line application)
environment:
current options for engine: lsf, local, slurm (for local, only set cores and leave all other fields blank)
engine: local queue: cores: 10 memory: time:
configuration:
Paths to databases used by evcouplings.
databases:
Sequence databases (only download the ones you want to use). You can also specify arbitrary databases in FASTA format
uniprot: /Users/sbelanger/Documents/research/data/reference/protein/uniprot/uniprot/uniprot_current.fasta uniref100: /Users/sbelanger/Documents/research/data/reference/protein/uniprot/uniref100/uniref100_current.fasta uniref90: /Users/sbelanger/Documents/research/data/reference/protein/uniprot/uniref90/uniref90_current.fasta
sequence_download_url: http://www.uniprot.org/uniprot/{}.fasta
pdb_mmtf_dir:
sifts_mapping_table: /Users/sbelanger/Documents/research/data/reference/protein/sifts/pdb_chain_uniprot_plus_current.csv sifts_sequence_db: /Users/sbelanger/Documents/research/data/reference/protein/sifts/pdb_chain_uniprot_plus_current.fasta
Paths to external tools used by evcouplings. Please refer to README.md for installation instructions and which tools are required.
tools: jackhmmer: /Users/sbelanger/opt/miniconda3/bin/jackhmmer hmmbuild: /Users/sbelanger/opt/miniconda3/bin/hmmbuild hmmsearch: /Users/sbelanger/opt/miniconda3/bin/hmmsearch plmc: /usr/local/bin/plmc hhfilter: /Users/sbelanger/opt/miniconda3/bin/hhfilter psipred: /Users/sbelanger/Documents/software/psipred/runpsipred cns: /Users/sbelanger/Documents/software/cns_solve_1.21/intel-x86_64bit-linux/bin/cns maxcluster: /Users/sbelanger/Documents/software/maxcluster64bit