PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
89 stars 16 forks source link

Do we need a blast index file or just a fasta file for --pdb-fasta? Need help here. #45

Closed Yang-Wang-2020 closed 5 months ago

Yang-Wang-2020 commented 5 months ago

"The first line in the blast index file does not seem to be correct, please re-create a blast index using the create-index command" I just used the pdb-redo/others/pdbredo_seqdb.txt as the --pdb-fasta and got the error message.

Yang-Wang-2020 commented 5 months ago

I assume the create-index creates the fasta file, not a blast index file, right? I am confused.

Yang-Wang-2020 commented 5 months ago

$ alphafill process --pdb-dir ../alphafill/pdb-redo --ligands TYL-ligand.cif --pdb-fasta ../alphafill/pdb-redo/others/pdbredo_seqdb.txt 4yji_apo.pdb 4yji_apo_alphafill.pdb The first line in the blast index file does not seem to be correct, please re-create a blast index using the create-index command

$ head ../alphafill/pdb-redo/others/pdbredo_seqdb.txt

206l_A MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNASKSELDKAIGRNTNGVITKDEAEKLFNQD VDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYN QTPNRAKRVITTFRTGTWDAYKnl

Spent a lot of time on this, still not sure what is wrong with the first line. I tried to make the blastdb index using the txt file. But the the error message was "Could not open blast index file (pdb-fasta option)"

drlemmus commented 5 months ago

Is it missing the '>' sign on the first entry or on every entry?

Yang-Wang-2020 commented 5 months ago

The fasta file looks just fine. ">206l_A MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNASKSELDKAIGRNTNGVITKDEAEKLFNQD VDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYN QTPNRAKRVITTFRTGTWDAYKnl"

Yang-Wang-2020 commented 5 months ago

the '>' was recognized as an MD sign in my previous post.

Yang-Wang-2020 commented 5 months ago

According to the comments in other issues, "pdb-redo/others/pdbredo_seqdb.txt" should be the correct premade input for --pdb-fasta. However, I still got this error.

"The first line in the blast index file does not seem to be correct, please re-create a blast index using the create-index command"

mhekkel commented 5 months ago

What you find on the internet may be out-of-date information. And that's what happened here. The latest version of AlphaFill requires a 'blast-index' in a different format than what is in the pdbredo_seqdb.txt file. That's why there is this 'create-index' command. It creates a new FastA file containing the sequences collected from your local installation of PDB-REOD (or PDB).

The blast algorithm in AlphaFill is one I wrote myself. It works on FastA files directly and yes, I understand that this is confusing for someone accustomed to use NCBI blast. My apologies for that.

Yang-Wang-2020 commented 5 months ago

Thanks for the reply. I did use 'create-index' to make the fasta file, but it also didn't work. That's why I was searching for the issue, and someone suggested using the 'pdbredo_seqdb.txt' file.

$ alphafill create-index  --pdb-dir pdb-redo/ --pdb-fasta pdb-redo-new.fasta
$ head pdb-redo-new.fasta
>pdb-entity|4IYF|1|
GIVEQCCTSICSLYQLENYCG
>pdb-entity|4IYF|2|
FVNQHLCGSHLVEALYLVCGERGFFYTPK
>pdb-entity|4IYD|1|
GIVEQCCTSICSLYQLENYCG
>pdb-entity|4IYD|2|
FVNQHLCGSHLVEALYLVCGERGFFYTPK
>pdb-entity|4IYE|1|EDO;PEG

$ alphafill process --pdb-dir ../alphafill/pdb-redo/ --pdb-fasta ../alphafill/pdb-redo-new.fasta ../P14/4yji_apo.pdb ../P14/4yji_alphafill.pdb
The first line in the blast index file does not seem to be correct, please re-create a blast index using the create-index command
mhekkel commented 4 months ago

Ah... you found a bug in AlphaFill. Problem is that the regular expression I use to scan the first line does not take into account that there might be sequences without any named ligands. If you strip off the first four entries from your fasta file then alphafill will work.

Meanwhile I'll fix the code.

Yang-Wang-2020 commented 4 months ago

Thank you! Without the first 4 sequences, it is finally working now!