Bondify / gtfs_functions

Package with useful functions to create geo-spatial visualizations from a GTFS.
MIT License
114 stars 30 forks source link

Linestrings are incorrect when importing shapes.txt which isn't pre-sorted #37

Closed Danishswag closed 7 months ago

Danishswag commented 8 months ago

Hello! Thank you for putting together such a wonderful and easy to use package. At the transit agency I work at, the shapes.txt files our GTFS feed archive aren't pre-sorted on the shape_pt_sequence column. This means that feed.shapes winds up connecting points in the wrong order unless I manually extract shapes.txt, sort it, and recreate the GTFS bundle before using the package to import it.

I believe the issue is in get_shapes(self) in the Feed class:

https://github.com/Bondify/gtfs_functions/blob/b7caea628c681a41b42b922042afbe565e133038/gtfs_functions/gtfs_functions.py#L691C15-L691C15

At a quick glance, I think something like changing this code block:

            aux = extract_file('shapes', self)
            shapes = aux[["shape_id", "shape_pt_lat", "shape_pt_lon"]]\
                .groupby("shape_id")\
                    .agg(list)\
                        .apply(lambda x: LineString(zip(x[1], x[0])), axis=1)

to something like:

            aux = extract_file('shapes', self)
            if "shape_pt_sequence" in aux.columns:
                aux = aux.sort_values(by=["shape_id", "shape_pt_sequence"])
            shapes = aux[["shape_id", "shape_pt_lat", "shape_pt_lon"]]\
                .groupby("shape_id")\
                    .agg(list)\
                        .apply(lambda x: LineString(zip(x[1], x[0])), axis=1)

should fix it. Technically shape_pt_sequence is required in the spec^1, so the if block might be unnecessary.

If you think this is a good solution, I'd be happy to submit a pull request - didn't see a contribution guide so didn't want to presume.

Bondify commented 7 months ago

hi @Danishswag thanks for raising the issue and proposing a solution. The latest version of the package incorporates the sorting by shape_pt_sequence. I should definitely work on a contribution guide as the input of the community would be very helpful. I'll close the issue for now. Hopefully when you find the next issue the contribution guide will be up.