Veh link logs - Githubissues

andkay commented 3 years ago

I've added a VehicleLinkLog handler that can be used for carbon calculations. Will require unit testing before merging -- but equally importantly, it should to be tested on some simulation data because this is a much bigger log than our usual fare.

Handler output that looks like this:

veh_id	veh_mode	link_id	entry_time	exit_time
chris	car	2-3	25201	25656
bus1	bus	2-3	27301	27756
nick	car	2-1	28801	28806

etc.

Because there is strong potential for the entry/exit events to be split across chunks -- I've added a simple dictionary ({veh_id: {data}}) to stage the data before its emitted to a ChunkWriter. Logic by event type is:

"enters link": stage the data
"left link" : Check the staging to see if a record of a link entry exists. If so, update and emit. Delete vehicle data from staging. The check will, in the first place, ignore vehicles that have used the link via the "enters traffic" event type.
"leaves traffic"-- delete vehicle data from staging. in this case, the vehicle has entered the link -- but only to access a facility at the end of its leg, and will not leave the link via the final node.

Believe this work because these events are ensured to be time-ordered, but correct me if i'm wrong.

Theodore-Chatziioannou commented 3 years ago

Since outputs are very large, can we enable exporting to a more efficient storage format. For Londinium (which is a very small simulation) the output file was 150MB, but reduced to 45MB if exported to hdf5 (70% size reduction).

fredshone commented 3 years ago

seems possible but i think more of a fafff... https://docs.h5py.org/en/latest/faq.html#appending-data-to-a-dataset

andkay commented 3 years ago

I think the low hanging fruit is optionally allow ChunkWriters to write HD5F files. I'll have a think on how to implement somewhat elegantly.

andkay commented 3 years ago

@Theodore-Chatziioannou -- I've refactored the event processing to try to pop the required dict entry, passing on key errors. Deleting is no longer required and will also catch the errors where the leg enters/leaves traffic on the same link (and hence, has no event of type enters link).

Also added a wrapper for dataframe.write methods -- it is currently functional but this isn't a great implementation, mostly because you need to lock a single file format to the handler objects.

would be much better to allow chunk writer to accept a list of write methods directly as in:

self.writers = [pd.DataFrame.to_csv, pd.DataFrame.to_hdf]

for writer in self.writers:
    writer(chunk_df, path, ...)

Please note that the h5 stores generated require keying the individual chunk tables -- so it needs to be read back iteratively as in:

hdf_obj = pd.HDFStore(path)
keys = hdf_obj.keys()

df_hdf = pd.DataFrame()
for key in keys:
    df_hdf = pd.concat(
        [df_hdf, pd.read_hdf(hdf_obj, key)]
    )

df_hdf

andkay commented 3 years ago

couple minor changes. this is no longer draft -- just need final approval from @fredshone or @Theodore-Chatziioannou to merge.

arup-group / elara

Veh link logs #151