Fix naming: `bundleId` -> `graphId` -> `networkId` ... choose 1?

trevorgerhardt commented 2 years ago

An incoming single point analysis request contains the bundleId and set's it on the AnalysisWorkerTask as the graphId here: https://github.com/conveyal/r5/blob/f71de5235a8ec773e91f3ddbec134bb5f5fecdb2/src/main/java/com/conveyal/analysis/models/AnalysisRequest.java#L202 Later, the worker task is used in the NetworkPreloader and set's the graphId to be a networkId here: https://github.com/conveyal/r5/blob/f71de5235a8ec773e91f3ddbec134bb5f5fecdb2/src/main/java/com/conveyal/r5/analyst/NetworkPreloader.java#L182

If possible, we should the same name when we are referring to the same thing. We can allow for more descriptive versions, but we should choose one of bundle or graph or network if we can.

Next step would be to find all of the different areas we are using each of the names.

`graphId`

AnalysisWorkerTask
WorkerCategory

`bundleId`

AnalysisRequest

`networkId`

NetworkPreloader
TransportNetworkCache

`scenarioId`

TransportNetwork

trevorgerhardt commented 2 years ago

Ah, the story gets deeper. In TransportNetwork we also use scenarioId to refer to the same thing:

https://github.com/conveyal/r5/blob/f71de5235a8ec773e91f3ddbec134bb5f5fecdb2/src/main/java/com/conveyal/r5/transit/TransportNetworkCache.java#L171

abyrd commented 2 years ago

The story behind this is that the one big OTP data structure containing both transit and streets has always been called a "graph". When we made R5 the transit was no longer explicitly represented as a graph, and we thought it better to use a name from the domain (transportataion) instead of what was used to model it (a graph). These terms got baked into various components that used to be separate projects used at different times in both OTP and R5 contexts.

Bundle, network, and scenario really are three different things. I never liked the term "bundle" or the fact that multiple feeds and OSM files were permanently fused together into a bundle at the moment they were uploaded - I prefer having free-floating "data sources" that are assembled at the moment we manually trigger a network build, but we haven't fully realized that yet.

So we really do have Bundles, Graphs, Networks, and Scenarios, we've just tried to use the same ID for all of them by convention (or suffixed the same ID when there's a one-to-many relationship like scenarios) so you don't have to look up which bundle produced which network etc.

So I think the best approach is to actually remove graphId (call everything networkId) and eliminate bundleId by manually triggering network builds. A scenario ID really is distinct from a network ID though it is tied to one specific network ID.

conveyal / r5