UDST / sanfran_urbansim

An UrbanSim for San Francisco: an example implementation of the new framework
39 stars 27 forks source link

.h5 dataset attributes #9

Closed lisalan520 closed 9 years ago

lisalan520 commented 9 years ago

Hi, I am a beginner of both urbansim and python who has some difficulty in understanding the data in sanfran_public.h5 file.

I used h5py module and got the 6 groups in the file as well as datasets (name,shape,type) in each group. For example, here is information for datasets in Buildings table:

Buildings [<HDF5 dataset "axis0": shape (9,), type "|S23">, <HDF5 dataset "axis1": shape (152605,), type "<i8">, <HDF5 dataset "block0_items": shape (6,), type "|S20">, <HDF5 dataset "block0_values": shape (152605, 6), type "<f4">, <HDF5 dataset "block1_items": shape (2,), type "|S23">, <HDF5 dataset "block1_values": shape (152605, 2), type "<f8">, <HDF5 dataset "block2_items": shape (1,), type "|S9">, <HDF5 dataset "block2_values": shape (152605, 1), type "<i4">]

By the size and type information, I guess that axis0 & axis1 refers to columns and rows of the table; and it seems there are 6 columns in block0, 2 in block1, and 1 in block2. Am I right about the above analyses? If right, then how can I know the exact attributes in each block?

Thanks!

jiffyclub commented 9 years ago

Hi @lisalan520, that file is created by Pandas's HDFStore utility, so it might look a bit unnatural if you open it straight up in h5py. Open it up with HDFStore and use the .keys() method to see the names of all the tables stored in the file. You'll also be able to load the tables from the h5 file as Pandas DataFrames.

You can read more about HDFStore at http://pandas.pydata.org/pandas-docs/dev/io.html#io-hdf5.

lisalan520 commented 9 years ago

Thank you @jiffyclub Now I'm able to see the attributes of each table in the h5 file. This is really helpful, thanks!

jiffyclub commented 9 years ago

Great!