CityofToronto / bdit_data-sources

Data sources used by the Big Data Innovation Team
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
39 stars 8 forks source link

Use pgrouting for bylaw matching (adding a cost to street names) #268

Closed radumas closed 4 years ago

radumas commented 4 years ago

As part of #265, We could replace and simplify a number of the heuristics in bylaw matching by using pgrouting instead (especially for streets that get offset by another street).

jovenc commented 4 years ago

I found a case where pgRouting routes the shortest path but that's not the path I want. And I can't really filter out the name as that is done after the pgRouting process. But at least I will get no results with this filtering which I just applied.

CREATE or REPLACE FUNCTION jchew.get_lines_btwn_interxn(_highway2 text, _int_start int, _int_end int)
RETURNS TABLE (int_start int, int_end int, seq int, geo_id numeric, lf_name varchar, objectid numeric, geom geometry, fcode integer, fcode_desc varchar)
LANGUAGE 'plpgsql' STRICT STABLE
AS $BODY$

BEGIN
RETURN QUERY
WITH 
results AS (SELECT _int_start, _int_end, * FROM
    pgr_dijkstra('SELECT id, source::int, target::int, cost from gis.centreline_routing_undirected', _int_start::int, _int_end::int, FALSE)
)
SELECT results._int_start, results._int_end, results.seq, 
centre.geo_id, centre.lf_name, centre.objectid, centre.geom, centre.fcode, centre.fcode_desc 
FROM results
INNER JOIN gis.centreline centre ON edge=centre.geo_id
WHERE centre.lf_name = _highway2 -----------------I added this *****
ORDER BY int_start, int_end, seq;

RAISE NOTICE 'pg_routing done for int_start: % and int_end: %', 
_int_start, _int_end;

END;
$BODY$;

This happened for bylaw_id = 6577 where highway = 'Garthdale Court' and between = 'Overbrook Place and Purdon Drive'. The two intersection ids found are 13448816 and 13448300 (marked as red cross below). The blue line is the result from pg_routing whereas the highlighted yellow path indicates the road segment from the bylaw.

image

radumas commented 4 years ago

Two possible paths forward on this:

  1. Modify the costs for the desired lf_name, e.g.: 'SELECT id, source::int, target::int, CASE lf_name WHEN '|| _highway2 || ' THEN 0.75*cost ELSE cost END AS cost from gis.centreline_routing_undirected'
  2. pre-filter the network to only include the desired street name, this will break with the multi-line string issue

Both would require joining gis.centreline_routing_undirected onto the centreline to get more attributes, or include lf_name in the VIEW creation.

jovenc commented 4 years ago

I tried adding the cost for pgrouting

SELECT _int_start, _int_end, * FROM
    pgr_dijkstra('SELECT id, source::int, target::int,
                 CASE lf_name WHEN ''|| _highway2 ||'' THEN (0.75*cost)::float ELSE cost END AS cost 
                 from gis.centreline_routing_undirected_lfname'::TEXT,
                 _int_start::bigint, _int_end::bigint, FALSE)

BUT they still route me to the same path which does not contain the same street name as highway2. I tried changing (0.75*cost) to various number and even (0.05*cost) but I still get the same path?

jovenc commented 4 years ago

Since we would need a double quote (") to specify the column highway2, another single quote (') would be required in front of the double quote to escape it withing a string.

For example, ''highway2'' would mean the string 'highway2' in the SELECT query whereas '''highway2''' would mean the column "highway2" where we have the street 'ABC St'. Therefore that part of the query should look like below.

SELECT _int_start, _int_end, * FROM
    pgr_dijkstra('SELECT id, source::int, target::int,
                 CASE lf_name WHEN '''|| _highway2 ||''' THEN (0.7*cost)::float ELSE cost END AS cost 
                 from gis.centreline_routing_undirected_lfname'::TEXT,
                 _int_start::bigint, _int_end::bigint, FALSE)

There is also another cleaner way to do it which is by using the function format, more information can be found here. Therefore, another way to write the above would be

SELECT _int_start, _int_end, * FROM
    pgr_dijkstra(format('SELECT id, source::int, target::int,
                 CASE lf_name WHEN %L THEN (0.7*cost)::float ELSE cost END AS cost 
                 from gis.centreline_routing_undirected_lfname'::TEXT, _highway2),
                 _int_start::bigint, _int_end::bigint, FALSE)

I'm using the second option for clarity purposes. Also, (0.7*cost) is used for now as that seems to work for most of the routes that I'm looking for. Might modify later depending on the results.

radumas commented 4 years ago

Shouldn't the cost also be based on levenshtein distance?

jovenc commented 4 years ago

The current function jchew.get_lines_btwn_interxn only takes in highway, int1 and int2 and adding levenshtein distance might add complexity. Instead of having the cost to be based on levenshtein distance, I have instead WHERE levenshtein(TRIM(centre.lf_name), TRIM(_highway2), 1, 1, 1) < 3. The magic number 3 is chosen after considering various cases.

radumas commented 4 years ago

I'm just worried that if the filtering after routing was an issue because of minor differences in spelling, it could also be an issue for finding the shortest path.

format('SELECT id, source::int, target::int,
                 CASE WHEN levenshtein(TRIM(lf_name), TRIM(%L), 1, 1,1) < 3 THEN (0.7*cost)::float ELSE cost END AS cost 
                 from gis.centreline_routing_undirected_lfname'::TEXT, _highway2)
jovenc commented 4 years ago

Wowww, after including the above into the function. It seems like only 15 bylaws fail at that stage instead of 84 bylaws. Re-running the speed limit processing function and will see if the results are better.