Closed JonasLi-19 closed 1 year ago
The caches are for faster training. It is much much more efficient to memory map a single large file than open and close many small files. The "types" file is a list of all the examples in your training set with their labels, like this: https://raw.githubusercontent.com/gnina/models/master/data/PDBBind2016/General_types/ccv_gen_norec_uff_0_test0.types
The cache is for the structures (atomic positions & atom types) of the receptor+ligand pairs. The labels of the pose quality and binding affinity are in the "types" file, as David said.
You absolutely can generate a cache for an entire set. We provide caches for CrossDocked2020, which has some 22.5million poses within it. To generate a cache, you first need to generate the correct types files, which is it's own pipeline and process. Then you can provide the types file(s) as input to create_caches2.py in order to make your own custom cache from your own custom types files.
Thanks for your reply, I have learned a lot! Now I wan to make sure that: If I train the model on types file, there is no need to change or remove the following in the model, right?
ligmolcache: "LIGCACHE_FILE"
recmolcache: "RECCACHE_FILE"
The caches are for faster training. It is much much more efficient to memory map a single large file than open and close many small files. The "types" file is a list of all the examples in your training set with their labels, like this: https://raw.githubusercontent.com/gnina/models/master/data/PDBBind2016/General_types/ccv_gen_norec_uff_0_test0.types
The cache is for the structures (atomic positions & atom types) of the receptor+ligand pairs. The labels of the pose quality and binding affinity are in the "types" file, as David said.
You absolutely can generate a cache for an entire set. We provide caches for CrossDocked2020, which has some 22.5million poses within it. To generate a cache, you first need to generate the correct types files, which is it's own pipeline and process. Then you can provide the types file(s) as input to create_caches2.py in order to make your own custom cache from your own custom types files.
You need to modify those lines. Currently, as written, Caffe will attempt to find a file on your computer called LIGCACHE_FILE which will fail, and the process will likely crash.
If you want to use a cache, you have to change LIGCACHE_FILE and RECCACHE_FILE to the name of the corresponding cache files on your machine. If you do NOT want to use a cache, then you can delete those lines from the caffe model file.
Note: there are two instances of these lines in our provided .model files -- one for the test and one for the train.
How to create cache file for multiple receptors and ligands (they are in gninatypes format)
1.I know I can use gninatyper to transfer ligands and receptors into gninatypes, but I do not know exacly how to use create_cache2py to generate cache file for a pair of ligand and receptor(they are in a independent dir named by PDBid, and definitly there are docked poses and crystal ligands). 【BTW, is it necessary to transfer them into cache rather than types file concerning with the time consument?】
Specially, I have no idea how to add the rmsd and affnity into the types or cahe file, are they call for csv files?
Here is the cache2,py description, without indicating what the -fname is, without mentioning about how to add rmsd and affnity data to the file line. '''Takes a bunch of types training files. First argument is what index the receptor starts on (ligands are assumed to be right after). Reads in the gninatypes files specified in these types files and writes out two monolithic receptor and ligand cache files in version 2 format. Version 2 is optimized for memory mapped storage of caches. keys (file names) are stored first followed by dense storage of values (coordinates and types). '''