minor issues about hdf5 files

DeepRank / deeprank

This repository has been integrated in https://github.com/DeepRank/deeprank2

Apache License 2.0

145 stars 27 forks source link

minor issues about hdf5 files #5

Open LilySnow opened 6 years ago

LilySnow commented 6 years ago

Maybe we should not call the pdb file of a model as "native" in the BM4 hdf5 files (e.g., 1E6E.hdf5) and call it "pdb" instead:

In [8]: list(f['1E6E_9w']) Out[8]: ['complex', 'features', 'features_raw', 'grid_points', 'mapped_features', 'native', 'targets']

Shall we also put haddock score in the BM4 hdf5 files for easy comparison with haddock scoring function?
In the final output data.hdf5 file, shall we also store the model IDs (currently it only contains target DockQ and predicted dockQ). The current version of data.hdf5 is not convenient for the comparison with other methods.

NicoRenaud commented 6 years ago

'complex' is the pdb of the conformation (e.g. 1E6E_9w.pdb) and 'native' is the corresponding native conformation (e.g 1E6E.pdb). That being said we can rename all of that that very easily.

LilySnow commented 6 years ago

Inside 1E6E.hdf5, we already have a folder for the native, as below, right? Should we then remove the "native" entry for all models inside 1E6E.hdf5, since they seem to be redundant?

In [6]: list(f['1E6E']) Out[6]: ['complex', 'features', 'features_raw', 'grid_points', 'mapped_features', 'native', 'targets']

NicoRenaud commented 6 years ago

we can 'clean' the hdf5 file and remove all entries that are not needed But this is at the cost of possibly not being able to add new data to it

LilySnow commented 6 years ago

Sorry, I do not understand. Why we have to have a cope of the native pdb file in each of the model file?

NicoRenaud commented 6 years ago

It's needed to compute i-rmsd l-rmsd and dockQ. And for convenience it's stored there as well but can be removed if needed

LilySnow commented 6 years ago

But the pdb file of the native is already in 1E6E.hdf5 as a separate entry, right?