Process Porto data - Githubissues

BrunoRosendo commented 4 months ago

After #91, I should process the data retrieved from Porto's metro and bus in order to properly run my algorithms. This might include organizing the available tables, using Google Maps if the time/distance between stations is not available or looking for extra data.

I think the trickiest part will be RPP, since I don't have data on the passengers' trips. I may look for this information elsewhere or try to come up with an assumption

BrunoRosendo commented 4 months ago

After analysing the metro's data, it only makes sense to use RPP since metros always go back and forth instead of being in a circle.

I reached the following steps:

[x] We have good information about the geographic places of the stops, which almost directly gives us the locations for the algorithm.
[x] As for the distance matrix, we can use the stop times to determine the time between stops.
[x] The number of vehicles can be considered the number of lines, and I should change this value to compare.
[x] I need to look for the capacity of the metro; that should be easy.
[x] The worst part is the trips since the data doesn't contain passengers' information. The simplest way is to consider every stop in the line as a trip and adjust the capacity accordingly. Another way might be to consider the number of trips on the route as an indicator of more passengers. Otherwise, I need more data or another sort of assumption.
[x] For multiple lines that share stops, I might need to repeat locations. Otherwise, the trips don't make sense
[x] At the end, I can try to put the map of Porto in the background. Displaying the solutions well needs some work

The most typical vehicle has space for 216 people (80 seated + 136 standing): https://pt.wikipedia.org/wiki/Metro_do_Porto#Flexity_Outlook/Eurotram_(MP001_a_MP072) There are other types of vehicle, but this will be my reference

BrunoRosendo commented 4 months ago

As for STCP buses:

[x] There are circular lines, so I can use those as an example of a single bus line with CVRP.
[x] For multiple lines together, I can do the same steps as for the metro. There are A LOT of stops, so choose a few lines wisely.

BrunoRosendo commented 4 months ago

For some reason, RPP is not respecting the "no return to depot" penalty and is ignoring a few stops in the end... I need to figure this out. I could simply give the algorithm more time, I need to test this

I can also check if it's related to the normalization and the really big numbers in the distance matrix, but I don't think so...

Edit: I think the problem is just too big, I need to try with fewer vars

BrunoRosendo commented 4 months ago

The classic solver is also having some trouble. The stops do not correspond with the line I'm testing

Edit: SOLVED, it was a problem when building the distance matrix and location names in remove_unused_locations

BrunoRosendo commented 4 months ago

For some reason, RPP is not respecting the "no return to depot" penalty and is ignoring a few stops in the end... I need to figure this out. I could simply give the algorithm more time, I need to test this

I can also check if it's related to the normalization and the really big numbers in the distance matrix, but I don't think so...

Edit: I think the problem is just too big, I need to try with fewer vars

I tried running in main.py after copying the simplified input from the classic solver and I got the right result... I will check if there's something wrong with how the RPP algorithm simplifies the input

BrunoRosendo commented 4 months ago

For some reason, RPP is not respecting the "no return to depot" penalty and is ignoring a few stops in the end... I need to figure this out. I could simply give the algorithm more time, I need to test this I can also check if it's related to the normalization and the really big numbers in the distance matrix, but I don't think so... Edit: I think the problem is just too big, I need to try with fewer vars

I tried running in main.py after copying the simplified input from the classic solver and I got the right result... I will check if there's something wrong with how the RPP algorithm simplifies the input

Conclusion: Everything is fine with the algorithm, it was just the stop trindade being in a different place in the locations array, even though it doesn't make a difference for the problem definition. However, the optimizer seemed to randomly get to the right solution sooner when using one of the formulations (the simplified one)

BrunoRosendo commented 4 months ago

Here's the RPP algorithm running on the D line stops (not a surprising result, since the line is straightforward)

BrunoRosendo commented 4 months ago

This is line A, with 1000+ vars, running on dwave hybrid system. This is impressive, especially since the local CPLEX optimizer can't even run this on the free version

BrunoRosendo commented 4 months ago

I'm having some trouble stretching the y axis, plotly is not making things easy... A workaround is to set the height of the figure and then zoom out like this

BrunoRosendo commented 4 months ago

Ignoring previous comments, these are the tasks left:

[x] Fix the SELECTED_TRIP_COUNT logic, using 2 disjoint bus lines yielding strange results with it. Using all the stops is too complex.
[x] Circular bus lines with CVRP (in progress, I don't like the current code)
[x] Multiple lines with shared stops (repeat locations?)
[x] Refactor #97
[x] Option to change "distance" to "time" when displaying solution
[x] Fix the multiple line logic
[x] Try to get useful examples with this, otherwise wait for #93

BrunoRosendo commented 4 months ago

Circular line being handled by VRP

26 locations 675 variables, 652 constraints, 3750 biases, 5.3s and 32ms with hybrid cqm solver

BrunoRosendo commented 4 months ago

Regular bus line

BrunoRosendo commented 4 months ago

Multiple bus lines are now working :partying_face: This was 5s on dwave hybrid solver

BrunoRosendo commented 4 months ago

This was 3838 vars/qubits on dwave hybrid solver, in 30 seconds. UHUUU lines 18+304

BrunoRosendo commented 4 months ago

I implemented a solution for multiple lines but my first tests didn't go very well. I should test more and fix the problems

BrunoRosendo commented 4 months ago

After some more testing, I have new conclusions.

RPP (straight lines):

Single line :white_check_mark:
Multiple adjoint lines :white_check_mark:
Multiple lines w/ common stops :x:

This happens because each location is only visited once and, if we try duplicating them, it's impossible to satisfy all trip requests. I tried satisfying only some trips but that resulted in unexpected behaviour from the algorithms. OR tools doesn't even yield a solution.

CVRP (circular lines):

Single line :white_check_mark:
Multiple adjoint lines :x:
Multiple lines w/ common stops :white_check_mark:

Adjoint lines aren't possible because we need a common depot. It's possible to support multiple lines if ALL of them have a common stop (the new depot). I need to duplicate other shared stops (w/ no connection between them) and make sure the dist. matrix is bi-directional, since we're changing the depot

BrunoRosendo commented 4 months ago

Things to do for CVRP (multiple lines):

If there are no common stops, raise exception.
If there isn't a common stop between ALL lines, raise warning and keep going. It's possible that the solution is unfeasible (line w/o connection) or there is a connection but the no. of vehicles used will be smaller.
If lines > 1 set the depot to the most common stop (it's a circle, so this is fine). Otherwise, fewer vehicles would be used.
~If depot moved, remove duplicates of that stop~ We also need to duplicate it to ensure connection to other stops

BrunoRosendo commented 4 months ago

This are lines 300 and 3002 together with 2 common stops and 1 vehicle. The result is not surprising

25 locations 624 variables, 602 constraints, 3456 biases

BrunoRosendo commented 4 months ago

There's still something wrong, sometimes not all connections are possible and I get infeasible results

BrunoRosendo commented 4 months ago

This are lines 300 and 3002 together with 2 common stops and 1 vehicle. The result is not surprising

As we can see, doing the same example with the distance API yields a different result. I think I'm finally able to get something interesting out of this

BrunoRosendo commented 4 months ago

There's still something wrong, sometimes not all connections are possible and I get infeasible results

I think I solved this with the logic below, there were some shenanigans with when I should be duplicating the depot

# Delete duplicates of the new depot. Needed to avoid unfeasible solutions with multiple vehicles.
    to_delete = min(NUM_VEHICLES - 1, most_common_freq - 1)
    i = 1
    while i < len(location_ids) and to_delete > 0:
        if location_ids[i] == most_common_id:
            del location_ids[i]
            del locations[i]
            del location_names[i]
            to_delete -= 1
        else:
            i += 1

Now I can work with 2 lines + 2 vehicles, for example.

or even more vehicles, although the result is the same

BrunoRosendo commented 4 months ago

I believe I can do useful things with my current setup. For example, 3 lines together:

Or at least confirm if the current lines are optimized (order of stops, not client satisfaction).

1821 variables, 131 samples, 30s runtime, 32ms qpu

I can also try to add a few stops if I want. I will create better examples while I'm writing the thesis

THIS CLOSES THE DEVELOPMENT PHASE UHUUU :partying_face:

BrunoRosendo / master-thesis

Process Porto data #92