Closed danielolsen closed 2 years ago
This is interesting! I'm wondering the performance of this algorithm. What is the maximum number of segments for a line in the dataset? We are creating a pretty dense graph by adding edges between all end-points of segments for a line. The overall time complexity is bounded by O(N*V^2logV) where N is the number of lines, V is the number of maximum segments. Not sure how long it takes to go through the current transmission dataset.
Just another random thought (might be a bad idea;)): assuming all the line segments roughly form a straight line rather than circles, would it be sufficient to project one of the end points of each segment onto an axis in a 3D cartesian coordinates system (latlong_to_xyz
) to get some sort of partial ordering? For example, consider three segments with start points being (x1, y1, z1), (x2, y2, z2), (x3, y3, z3), sort the points based on the projections of the axis with maximum span (suppose max(x1, x2, x3) - min(x1, x2, x3) > max(y1, y2, y3) -min(y1, y2, y3) > max(z1, z2, z3) - min(z1, z2, z3)
, then x axis is chosen). In this way, a partial ordering of segments is determined. It won't tell us the start/end info and not able to fill gaps between segments though. Also, not sure how it performs over the noisy source data.
For the July 2020 dataset, we have 89,744 total lines, of which 80 need joining. On my laptop, this step takes about three seconds to get through all the lines (including the single-segment lines where we're just identifying that the list has length one and returning the first element). The vast majority of these lines have either 2 or 3 segments, although there's one line with 5, one line with 10, and one line with 33. I believe that when I was looking at the February 2022 dataset, I didn't see a 33-segment line, but I did see a 14-segment line (maybe in its place?).
@danielolsen Thanks for looking into the statistics, <0.1% is super light. Then I presume this is minor. Probably it isn't worth the time to explore other possibilities. Given what we have works for now, let's stick with it.
Pull Request doc
Purpose
The HIFLD transmission line paths are defined by either: a list of lat/lon points (which I'll call a 'segment' from hereon out), or a list of such lists. The Jul 2020 dataset values seem to always be a list of segments, but the February 2020 dataset is either a single segment (not wrapped in a list) or a list of segments.
Previously, when we had a list of segments, we always grabbed the first one, discarding all the rest. This PR changes the logic so that if we get a list of segments of length greater than one, we make an assumption about how to reconnect them. The logic is a little complex because the segments aren't 'drawn' in any particular order, don't necessary connect, and there are some small noisy segments that need to be discarded.
What the code is doing
Within
helpers.py
, we copy over a couple of helper functions for calculating distances between coordinate pairs.Within
load.py
, we add a new function_join_line_segments
to take a list of segments and parse them._generate_linear_spanning_tree
which:_generate_linear_spanning_tree
, and if the tree isn't a chain, the smallest segment is dropped and the process is repeated until we have a chain tree.Testing
Tested manually. Calling
get_hifld_electric_power_transmission_lines
works, as well ascreate_grid
.Time estimate
30-60 minutes. The code is not too complex, but the logic is, since it needs to handle many edge cases found in the source data.