PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
90 stars 18 forks source link

Problems running prepare-pdb-list #32

Closed abiadak3 closed 1 year ago

abiadak3 commented 1 year ago

I'm having some issues running the function prepare-pdb-list for filtering the input PDB list.

I'm running it like this:

alphafill prepare-pdb-list --pdb-dir=./pdb --pdb-fasta=pdb_seqres-test.txt

In the "pdb" directory, there are 1000 PDBs in CIF format, and some of them contain ligands. The file "pdb_seqres-test.txt" contains the sequences in FASTA format corresponding to the files in the "pdb" directory.

The program doesn't list the PDBs with ligands, it simply outputs a blank line in less than 1 second. By tracing the program with strace, it only accesses the files "af-ligands.cif", "alphafill.conf", and the "pdb/" directory. It does not read the ".cif" files in the "pdb/" or the "pdb_seqres-test.txt" file.

The "--output" option is not recognized, it gives an error message saying "unknown option".

The program does not run if the alphafill.conf file does not exist in the directory. It gives an error message saying "the specified config file was not found". On the other hand, I cannot find in the documentation which is the format of that file and a list of the options that can be included with their explanations.

Maybe I'm doing something wrong?

mhekkel commented 1 year ago

Alphafill assumes a standard layout of the PDB directory, as if you fetched it from PDB-REDO (or the PDB of course). That means, files are located in subdirectories with names of two characters length. Of course, this is inflexible and a bit silly. But if you move your files in a directory called ./pdb/00 and then use ./pdb as argument, the prepare-pdb-list command will probably work better.

I see room for improvement in alhpafill here.

abiadak3 commented 1 year ago

Ok, thanks again. Really it needs a directory structure like this:

pdb/02/102l/102l_final.cif
pdb/02/102d/102d_final.cif
pdb/00/100d/100d_final.cif
mhekkel commented 1 year ago

Ah, yes, that's the pdb-redo way of storing data.

chrisjurich commented 1 year ago

Should the names in the fasta be the base cif name?

mhekkel commented 1 year ago

The name in the fasta should be

fourlettercode, underscore, asym_id of the chain

Like in:

>1cbs_A

rytakahas commented 1 year ago

After I was able to compile alphafill, I have encountered the same issues. If you could provide to users a concreate alphafill.config example, it is really applcated.

gussing alphafill.conf like pdb-dir=./pdb-redo pdb-fasta=./1gos.fasta whic of course does not work.

Another question is that looking at the

alphafill -h

there is any option to put uniprot fasta, rgiht? It is only in the web (alphafill.eu)? Also, not to be confused,

The "--output" option is not recognized, it gives an error message saying "unknown option".

Could you update the README.md Many thanks,

drlemmus commented 1 year ago

The fasta file should contain all the sequences that are represented in the structure data. I.e. all pdb-redo data. Putting in the Uniprot fasta makes no sense as you do not have associated structure models.

Did you try absolute paths for the data files, e.g.: pdb-fasta=/DATA/pdb-redo/others/pdbredo_seqdb.txt pdb-dir=/DATA/pdb-redo/

rytakahas commented 1 year ago

Thanks for your feedback, I have downloaded pdb-redo in the local machine, however I don't see others/pdbredo_seqdb.txt this file. Probably, I failed to download all files? pdb-redo, it has total 573G. How much pdb-redo db total has?

Many thanks,

drlemmus commented 1 year ago

It is over a TB, but that includes a lot of data you don't need. You only need the mmCIF files called ????_final.cif

You can get the sequence file through a browser but you need to be logged in on pdb-redo.eu. (https://pdb-redo.eu/others/pdbredo_seqdb.txt)

rytakahas commented 1 year ago

Many thanks, I was able to install all pdb-redo cif files. Alphafill works nicely in a local machine.

However, I have one question regarding to alphafill DB. alphafill_DB was downloaded more than year ago in the local machine. On the current web (https://alphafill.eu/), the DB is updated?

For example, uniprot id, A0A5P2XKZ4 is found on the web, but not locally installed one. If I download

rsync -av rsync://rsync.alphafill.eu/alphafill/ alphafill/

it will be syncronyzed with current alphafill website DB?

Thanks,

drlemmus commented 1 year ago

Yes

rytakahas commented 1 year ago

currently updating alphafill DB, could you tell me how big it is? Thanks,

drlemmus commented 1 year ago

172 GB but that will gradually go up

mgm-14392 commented 1 year ago

Hello, would alphafill work with gzipped cif files or should I decompress them?

mhekkel commented 1 year ago

Alphafill should work with gzipped files.