arup-group / elara

Command line utility for processing MATSim events output files.
MIT License
14 stars 4 forks source link

Veh link logs #151

Closed andkay closed 3 years ago

andkay commented 3 years ago

I've added a VehicleLinkLog handler that can be used for carbon calculations. Will require unit testing before merging -- but equally importantly, it should to be tested on some simulation data because this is a much bigger log than our usual fare.

Handler output that looks like this:

veh_id veh_mode link_id entry_time exit_time
chris car 2-3 25201 25656
bus1 bus 2-3 27301 27756
nick car 2-1 28801 28806

etc.

Because there is strong potential for the entry/exit events to be split across chunks -- I've added a simple dictionary ({veh_id: {data}}) to stage the data before its emitted to a ChunkWriter. Logic by event type is:

Believe this work because these events are ensured to be time-ordered, but correct me if i'm wrong.

Theodore-Chatziioannou commented 3 years ago

Since outputs are very large, can we enable exporting to a more efficient storage format. For Londinium (which is a very small simulation) the output file was 150MB, but reduced to 45MB if exported to hdf5 (70% size reduction).

fredshone commented 3 years ago

seems possible but i think more of a fafff... https://docs.h5py.org/en/latest/faq.html#appending-data-to-a-dataset

andkay commented 3 years ago

I think the low hanging fruit is optionally allow ChunkWriters to write HD5F files. I'll have a think on how to implement somewhat elegantly.

andkay commented 3 years ago

@Theodore-Chatziioannou -- I've refactored the event processing to try to pop the required dict entry, passing on key errors. Deleting is no longer required and will also catch the errors where the leg enters/leaves traffic on the same link (and hence, has no event of type enters link).

Also added a wrapper for dataframe.write methods -- it is currently functional but this isn't a great implementation, mostly because you need to lock a single file format to the handler objects.

would be much better to allow chunk writer to accept a list of write methods directly as in:

self.writers = [pd.DataFrame.to_csv, pd.DataFrame.to_hdf]

for writer in self.writers:
    writer(chunk_df, path, ...)

Please note that the h5 stores generated require keying the individual chunk tables -- so it needs to be read back iteratively as in:

hdf_obj = pd.HDFStore(path)
keys = hdf_obj.keys()

df_hdf = pd.DataFrame()
for key in keys:
    df_hdf = pd.concat(
        [df_hdf, pd.read_hdf(hdf_obj, key)]
    )

df_hdf
andkay commented 3 years ago

couple minor changes. this is no longer draft -- just need final approval from @fredshone or @Theodore-Chatziioannou to merge.