eumiro / pygohome

A 100% personal route optimizer in a known environment based on experience
https://eumiro.github.io
MIT License
33 stars 2 forks source link

import gpx without gpsbabel #4

Closed philshem closed 4 years ago

philshem commented 4 years ago

based on this tweet: https://twitter.com/eumiro/status/1257008378800484352?s=20

> Also I want to find a fast Pythonic way to read the GPX files to avoid (fast but still an extra step) gpsbabel.

I've had success with gpxpy (and I think now the slow datetime recognition is improved).

Basic usage:

with open(filename, 'rb') as gpx_file:
    gpx = gpxpy.parse(gpx_file, parser='lxml')
    #gpx = gpxpy.parse(gpx_file, parser='minidom')

# loop through gpx file and extract each point
for track in gpx.tracks:
    for segment in track.segments:
        for p in segment.points:
                # do something here

To convert lat/long to UTM grid, you can use this package: https://github.com/Turbo87/utm

import utm
utm.from_latlon(51.2, 7.5)

my opinion: rather than convert to a table/csv, I'd just convert the gpx to an array of dicts (one for each gps point), using the above tools. Then use pandas.from_dict().

eumiro commented 4 years ago

Thank you, I have actually originally started with gpxpy and utm (I'm already using utm to convert back to latlon to plot in ipyleaflet), but it was comparatively much slower (speaking about 60k+ points, which represents over 900km of tracks) when loading all raw GPX files each time:

def read_segments():
    for path in sorted(Path(".").glob("gpx/*.gpx")):
        gpx = gpxpy.parse(path.read_text())
        for track in gpx.tracks:
            for segment in track.segments:
                if len(segment.points) < 5:
                    continue
                yield [
                    (point.time_difference(segment.points[0]),
                     utm.from_latlon(point.latitude, point.longitude, 32, "U")[:2])
                    for point in segment.points
                ]
segments = list(read_segments())

UTM system is perfect for local geometry and distances, but maybe with GeoPandas the expensive conversion could be skipped.

The performance problem is also the conversion of ALL points, which could be solved by some simplifying methods of gpxpy.

Or some sort of database/cache for all converted points. And we're back at CSVs, which offer the user a stable and reviewable step in data processing.

philshem commented 4 years ago

Another option is to run gpsbabel in the Jupyter notebook with Unix commands (!)

https://intro.syzygy.ca/unix-tricks/#unix-and-magic-in-a-notebook

(Would have to run with ‘wget’ or ‘git clone’ to get local ‘gpsbabel’ binary.)

eumiro commented 4 years ago

utm can do NumPy arrays, it is just too shy about it. That's why I added it to its docs: https://github.com/Turbo87/utm/pull/50 Converting 1M points in a loop takes 90 seconds, a 1M numpy array needs 0.26s. Now it's an option again.

eumiro commented 4 years ago

gpxpy and utm used to load GPX files in https://github.com/eumiro/pygohome/commit/1a30a69d8f0089625e8ef05976fc6b2b947df067 Thanks @philshem