Closed andkay closed 3 years ago
Since outputs are very large, can we enable exporting to a more efficient storage format. For Londinium (which is a very small simulation) the output file was 150MB, but reduced to 45MB if exported to hdf5 (70% size reduction).
seems possible but i think more of a fafff... https://docs.h5py.org/en/latest/faq.html#appending-data-to-a-dataset
I think the low hanging fruit is optionally allow ChunkWriters to write HD5F files. I'll have a think on how to implement somewhat elegantly.
@Theodore-Chatziioannou -- I've refactored the event processing to try to pop the required dict entry, passing on key errors. Deleting is no longer required and will also catch the errors where the leg enters/leaves traffic on the same link (and hence, has no event of type enters link
).
Also added a wrapper for dataframe.write methods -- it is currently functional but this isn't a great implementation, mostly because you need to lock a single file format to the handler objects.
would be much better to allow chunk writer to accept a list of write methods directly as in:
self.writers = [pd.DataFrame.to_csv, pd.DataFrame.to_hdf]
for writer in self.writers:
writer(chunk_df, path, ...)
Please note that the h5 stores generated require keying the individual chunk tables -- so it needs to be read back iteratively as in:
hdf_obj = pd.HDFStore(path)
keys = hdf_obj.keys()
df_hdf = pd.DataFrame()
for key in keys:
df_hdf = pd.concat(
[df_hdf, pd.read_hdf(hdf_obj, key)]
)
df_hdf
couple minor changes. this is no longer draft -- just need final approval from @fredshone or @Theodore-Chatziioannou to merge.
I've added a VehicleLinkLog handler that can be used for carbon calculations. Will require unit testing before merging -- but equally importantly, it should to be tested on some simulation data because this is a much bigger log than our usual fare.
Handler output that looks like this:
etc.
Because there is strong potential for the entry/exit events to be split across chunks -- I've added a simple dictionary (
{veh_id: {data}}
) to stage the data before its emitted to a ChunkWriter. Logic by event type is:"enters link"
: stage the data"left link"
: Check the staging to see if a record of a link entry exists. If so, update and emit. Delete vehicle data from staging. The check will, in the first place, ignore vehicles that have used the link via the"enters traffic"
event type."leaves traffic"
-- delete vehicle data from staging. in this case, the vehicle has entered the link -- but only to access a facility at the end of its leg, and will not leave the link via the final node.Believe this work because these events are ensured to be time-ordered, but correct me if i'm wrong.