Closed jbarnoud closed 7 years ago
You can access file/directory attributes of Leaves/Trees with their path
property. This has a method called stat
which will return an os.stat_result
object that has attributes such as st_mtime
that you can use for this purpose:
import datreant.core as dtr
t = dtr.Treant('sprout')
t['some_data/pdData.h5'].path.stat().st_mtime
Does this satisfy your problem? I think we'll avoid exposing these stat
attributes in the Tree
and Leaf
properties, since these will vary by system (e.g. Windows).
It should work indeed. Is there an easy way of knowing the file name from the data key?
@jbarnoud actually yes, by convention the data
limb stores pandas objects in HDF5 files created with PyTables with names pdData.h5
, numpy arrays in HDF5 files created with h5py
with names npData.h5
, and anything else as pickles with names pyData.pkl
. One thing you could do is to use globbing:
# glob returns a View giving all matches; we select first element
t.glob('some_data/*.h5')[0].path.state().st_mtime
# to match all possibilities (.h5 or .pkl) could do
(t.glob('some_data/*.h5') + t.glob('some_data/*.pkl'))[0].path.state().st_mtime
This is the most general way to do this, I think, since you could do the same with any files in a Tree/Treant's Tree. Something we could do is add a method to .data
such as filepath
that would give you the full path to the named data's datafile, but it would essentially do the above under the hood.
I like the idea of the filepath
shortcut. I can try to take care of it in a not to distant future.
(Should I close that issue and open an other one for data.filepath
?)
@jbarnoud yeah that would help to bring focus to the issue. Please open another one and link back to the discussion here. Thanks!
Knowing when a data got modified last would allow me to know if it is up to date. For instance, it would allow me to rerun an analysis on all Treants for which the data of interest is older than the analysis script, or older than the trajectory.
Since data are stored in directories, there could be a json file in that directory with some metadata. Alternatively, datreants.data could access the file system metadata for the hdf5 file.