datreant / datreant.data

convenient data storage and retrieval in HDF5 for Treants
http://datreant.org/
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Keep track of last modification time #11

Closed jbarnoud closed 7 years ago

jbarnoud commented 7 years ago

Knowing when a data got modified last would allow me to know if it is up to date. For instance, it would allow me to rerun an analysis on all Treants for which the data of interest is older than the analysis script, or older than the trajectory.

Since data are stored in directories, there could be a json file in that directory with some metadata. Alternatively, datreants.data could access the file system metadata for the hdf5 file.

dotsdl commented 7 years ago

You can access file/directory attributes of Leaves/Trees with their path property. This has a method called stat which will return an os.stat_result object that has attributes such as st_mtime that you can use for this purpose:

import datreant.core as dtr

t = dtr.Treant('sprout')
t['some_data/pdData.h5'].path.stat().st_mtime

Does this satisfy your problem? I think we'll avoid exposing these stat attributes in the Tree and Leaf properties, since these will vary by system (e.g. Windows).

jbarnoud commented 7 years ago

It should work indeed. Is there an easy way of knowing the file name from the data key?

dotsdl commented 7 years ago

@jbarnoud actually yes, by convention the data limb stores pandas objects in HDF5 files created with PyTables with names pdData.h5, numpy arrays in HDF5 files created with h5py with names npData.h5, and anything else as pickles with names pyData.pkl. One thing you could do is to use globbing:

# glob returns a View giving all matches; we select first element
t.glob('some_data/*.h5')[0].path.state().st_mtime

# to match all possibilities (.h5 or .pkl) could do
(t.glob('some_data/*.h5') + t.glob('some_data/*.pkl'))[0].path.state().st_mtime

This is the most general way to do this, I think, since you could do the same with any files in a Tree/Treant's Tree. Something we could do is add a method to .data such as filepath that would give you the full path to the named data's datafile, but it would essentially do the above under the hood.

jbarnoud commented 7 years ago

I like the idea of the filepath shortcut. I can try to take care of it in a not to distant future.

(Should I close that issue and open an other one for data.filepath?)

dotsdl commented 7 years ago

@jbarnoud yeah that would help to bring focus to the issue. Please open another one and link back to the discussion here. Thanks!