Closed XuBlack closed 3 years ago
The last error occurred in
File "/DIPS-Plus/project/utils/utils.py", line 390, in find_fasta_sequences_for_pdb_file
fasta_files = [os.path.join(external_feats_subdir, file) for file in os.listdir(external_feats_subdir)
Hi, @XuBlack.
In our feature generation pipeline, when you run the script generate_hhsuite_features.py
, it should write these FASTA sequence files to the directory you listed above (e.g., /opt/data/private/protein/DIPS-Plus/project/datasets/DB5/interim/external_feats/OF/work
), assuming you provided generate_hhsuite_features.py
with the value "$PROJDIR"/project/datasets/DB5/interim/external_feats
for the CLI argument output_dir
. This logic is housed in the atom3-py3
library DIPS-Plus makes use of. Specifically, you can find where the FASTA sequence files should be written to local storage here.
Since HH-suite3 makes use of FASTA sequence files as input, we have to extract the FASTA sequence for each input PDB file and write it to local storage before running HH-suite3 in generate_hhsuite_features.py
. When you then run postprocess_pruned_pairs.py
in the way that you did, this script should then assemble the full filepaths to each of the (previously-generated) FASTA sequence files corresponding to each PDB file you are postprocessing.
Can you confirm that for the DB5 dataset you have run generate_hhsuite_features.py
before running postprocess_pruned_pairs.py
? Also, can you verify whether generate_hhsuite_features.py
indeed wrote the FASTA sequence files to "$PROJDIR"/project/datasets/DB5/interim/external_feats
as expected?
Thank you for you reply!
I have run generate_hhsuite_features.py
before running postprocess_pruned_pairs.py
. According to your prompt, after reading the source code of atom3-py3
, I found that when I run the command python3 "$PROJDIR"/project/datasets/builder/generate_hhsuite_features.py "$PROJDIR"/project/datasets/DB5/interim/parsed "$PROJDIR"/project/datasets/DB5/interim/parsed "$HHSUITE_DB" "$PROJDIR"/project/datasets/DB5/interim/external_feats --rank "$1" --size "$2" --num_cpu_jobs 4 --num_cpus_per_job 8 --num_iter 2 --source_type db5 --write_file
, it only generates a csv file. I need to change the parameter --write_file
of the command to --read_file
and run it again to generate hhsuite features.
When I ran it again with new parameters --read_file
, another error occurred.
For the DB5 dataset, when it runs make_dataset.py
, the path of the generated pkl file is output_dir + '/' + db.get_pdb_code(pdb_filename)[1:3] + db.get_pdb_name(pdb_filename) + ".pkl"
, which is in lines 48-57 of parse.py
in atom3-py3
; but it is in lines 48-57 of conservation.py
in atom3-py3
. When parsing the path in lines 451-454, the path obtained is output_dir + '/' + db.get_pdb_code(pdb_filename) + db.get_pdb_name(pdb_filename) + ".pkl"
. The path of two is different.
After I rewrite the path, it can run successfully!
After it generates all hhsuite features successfully, I will try to run postprocess_pruned_pairs.py.
Thanks!
@XuBlack,
Once your HH-suite features have finished generating, and you can verify that your complexes are postprocessed successfully by postprocess_pruned_pairs.py
, would you be able to share with me which lines of code in either this repository or in the atom3-py3
repository's files you needed to change to generate DB5 complexes? You can either reply with your changes here, or, if you'd rather, you are also welcome to open a pull request to merge your changes into master
.
I greatly appreciate your attention to detail as you use this pipeline! It seems I may have missed making some changes to the filepaths used in this project since updating the DeepInteract
repository. I will try to get those corrected once we know exactly which filepaths are currently incorrect for the DB5 dataset.
I have created a pull request in the atom3-py3
repository.
And thank you for your share another repository DeepInteract
. I have lots of interest in it and will learn more about it.
@XuBlack, I just finished upgrading DIPS-Plus
' (this repository's) version of atom3-py3
to include the bug fix you authored over in the atom3-py3
repository. If you encounter any further issues in the construction of filepaths, please let us know. We appreciate your pointing this bug out.
When I run the command
python3 project/datasets/builder/postprocess_pruned_pairs.py "$PROJDIR"/project/datasets/DB5/raw "$PROJDIR"/project/datasets/DB5/interim/pairs "$PROJDIR"/project/datasets/DB5/interim/external_feats "$PROJDIR"/project/datasets/DB5/final/raw --num_cpus 32 --source_type db5
,it occurs
FileNotFoundError: [Errno 2] No such file or directory: '/opt/data/private/protein/DIPS-Plus/project/datasets/DB5/interim/external_feats/OF/work'
.It seems that the fasta file is missing. I couldn't find the codes about how to process fasta in your sharing. Is there some code missing on Guthub? Or I need to download it myself.
thanks.