arup-group / elara

Command line utility for processing MATSim events output files.
MIT License
14 stars 4 forks source link

Road pricing update #87

Closed sarah-e-hayes closed 3 years ago

sarah-e-hayes commented 3 years ago

Two changes: (1) now handles toll link tags if provided and

(2) euclidean distance handler has two additional options (mode and purpose ) to allow cross tabulation of distance by an additional field. Note that the option all provides the aggregate (not cross tabulated) distance breakdown. One trouble I had was that to allow these additional options, I had to specify that they were acceptable options for the trip logs handler which the euclidean distance depends on. As a result, it now outputs trip_log_mode.csv and trip_log_purpose.csv which are duplicates of the standard trip_log_all.csv. I couldn't see a way around this - any ideas?

fredshone commented 3 years ago

I am catious. A big priority for elara is to remove as much complexity as possible. You guys are super users - I know you can handle extra configuring - but I am thinking of regular users who need to learn and those, including myself who just want to quickly dip in.

I have no issue adding hundreds of new tools to elara if they fit an easy pattern. Essential it could operate like a shopping list.


To give this proper consideration - I think a good config pattern would be:

[plan_handlers]
~duration_bin_cross_tabs = [[], ['mode'], ['mode', 'purpose'], ['purpose'], ['mode', 'purpose', 'age'], ...]

But as soon as we get into attributes like 'age', the logic changes and the parameterisation becomes project specific, ie does the population have this attribute? In which case it might be more sensible for users to do project specific outputs using 'trips_log' and their own tooling. So we should maybe stick to 'subpopulation' as an attribute. Although we can also have some sensible defaults. ie if 'age' is missing then it does a warning and all ends up in the same bin called 'ages'. There will need to be very good docs.

If we support some combination of {'mode', 'purpose', 'time', 'subpopulation'} then there are ~16 permutations (?) which i agree is a lot of tools to add. 32 if we repeat for durations and distances and then so on...

Hmmmm...

Happy for you to give it a go. But i do not accept it in it's current form.

fredshone commented 3 years ago

In the short term @sarah-e-hayes if you want to get this merged then please remove the options and create separate new tools.

Theodore-Chatziioannou commented 3 years ago

I agree with the need for simplicity; what worries me a bit is the lack of a middle option between the simple- and super-user currently. This is fine when having few projects, being initially set up by lab people, but eventually it would end up either with maintainers being flooded with requests for small changes, or regular users having to deal with a lot more complexity than an advanced config file (and probably give up).

As you say, it's difficult to draw the line between nice-to-have features and project-specific analysis, and it can be a slippery slope if we start adding such options. In most cases, we can say that the trip and leg logs already contain all information required for any likely analysis and stop there. What I would like to change, however, is move any project-specific class which only consists of variable declarations to the config, as optional and nested parameters of their parent handler callers. I can create such a branch and we can discuss there whether we wish to keep or extend that logic to other functions?

I am thinking instead of:

[benchmarks]
london_rods= ["all"]

, with the london_rods being a benchmark class, use:

[benchmarks]
passenger_stop_to_stop:
 - name:'london_rods'
 - path:'london/../../'
 - valid_options:['subway']
 - ...

(I am sure it's will be harder to do than it sounds)

sarah-e-hayes commented 3 years ago

So what I ended up doing was halfway between your two suggestions. I removed the 'mode' and 'purpose' options from being able to be called in the config for the trip_euclid_distance_breakdown handler and instead made the default behaviour to produce three outputs. One is the original binned by distance output, and then two cross tabulations (mode and purpose).

I modified how the breakdown method works slightly to do this - if it's passed a groupby field, then it will append the field to the csv name to distinguish it. I then looped through three options in the handler via a dictionary to get the three flavours of output.

sarah-e-hayes commented 3 years ago

Changed road pricing input file so that one toll road was not tagged. The two tests test cases where the tag is provided and omitted.