b-r-u / osmpbf

A Rust library for reading the OpenStreetMap PBF file format (*.osm.pbf).
Apache License 2.0
122 stars 19 forks source link

Missing nodes from way #7

Closed Timmmm closed 3 years ago

Timmmm commented 3 years ago

I've written a simple program to extract all of the highways from a file and insert them into a SQLite database. Unfortunately it seems like read_ways_and_deps() gives me a Way but doesn't give me one of the Nodes it contains. If I run it I get this error:

Got 2205065 ways and 9188804 nodes
Insert; 0
Error: Node 8188406987 is missing

I may have misunderstood the data model. It would really help if you used TiVec and typed IDs. Like this:

#[derive(Clone, Copy, Debug, Default, Eq, From, Hash, Into, Ord, PartialEq, PartialOrd)]
pub struct NodeId(pub i64);

It's a little more work but way easier to follow.

Anyway here's the complete code. I would really appreciate it if you could figure out what I'm doing wrong! The relevant bit is insert_data(). (Also feel free to use this as an example if it can be fixed.)

use anyhow::{anyhow, bail, Result};
use osmpbf::{IndexedReader, Element};
use std::path::Path;
use std::fs::{File, remove_file};
use rusqlite::params;
use std::collections::HashMap;

fn main() -> Result<()> {

    let path = Path::new("map.sqlite");

    let delete_result = remove_file(path);
    if matches!(&delete_result, Err(e) if e.kind() != std::io::ErrorKind::NotFound) {
        bail!("Couldn't delete {:?}: {:?}", path, delete_result);
    }

    let db = rusqlite::Connection::open_with_flags(
        path,
        rusqlite::OpenFlags::SQLITE_OPEN_READ_WRITE |
        rusqlite::OpenFlags::SQLITE_OPEN_CREATE,
    )?;
    create_tables(&db)?;

    let reader = IndexedReader::from_path("netherlands.osm.pbf")?;
    insert_data(&db, reader)?;

    Ok(())
}

fn create_tables(db: &rusqlite::Connection) -> Result<()> {
    eprintln!("Creating tables");
    db.execute_batch("
    PRAGMA journal_mode = OFF;
    PRAGMA locking_mode = EXCLUSIVE;
    CREATE VIRTUAL TABLE objects USING rtree(
        id,              -- Integer primary key
        minX, maxX,      -- Minimum and maximum X coordinate (f32)
        minY, maxY,      -- Minimum and maximum Y coordinate (f32)
        -- The following are auxiliary columns.
        +boundary BLOB   -- detailed boundary of object
    );
")?;
    Ok(())
}

fn insert_data(db: &rusqlite::Connection, mut reader: IndexedReader<File>) -> Result<()> {
    eprintln!("Inserting data");

    let mut way_count = 0u64;
    let mut element_count = 0u64;

    let mut ways = Vec::<Vec<i64>>::new();
    let mut nodes = HashMap::<i64, (f64, f64)>::new();

    reader.read_ways_and_deps(
        |way| {
            if way_count % 100000 == 0 {
                eprintln!("Ways; {}", way_count);
            }
            way_count += 1;
            way.tags().any(|(key, _value)| key == "highway")
        },
    |element| {
            if element_count % 100000 == 0 {
                eprintln!("Elements; {}", element_count);
            }
            element_count += 1;
            match element {
                Element::Node(node) => {
                    nodes.insert(node.id(), (node.lat(), node.lon()));
                }
                Element::DenseNode(node) => {
                    nodes.insert(node.id, (node.lat(), node.lon()));
                }
                Element::Way(way) => {
                    ways.push(way.refs().collect());
                },
                _ => {}
            }
        },
    )?;

    eprintln!("Got {} ways and {} nodes", ways.len(), nodes.len());

    // minX, maxX, minY, maxY, boundary
    let mut way_insert_statement = db.prepare("INSERT INTO objects VALUES (NULL, ?1, ?2, ?3, ?4, ?5)")?;

    for (i, way) in ways.into_iter().enumerate() {
        if i % 100000 == 0 {
            eprintln!("Insert; {}", i);
        }
        // Iterate through the nodes, building a boundary blob and calculating
        // the bounding box.

        if way.is_empty() {
            continue;
        }

        let mut min_lon = f64::MAX;
        let mut max_lon = f64::MIN;
        let mut min_lat = f64::MAX;
        let mut max_lat = f64::MIN;

        let mut boundary: Vec<u8> = Vec::new();

        for node_ref in way {
            let node = nodes.get(&node_ref).ok_or_else(|| anyhow!("Node {} is missing", node_ref))?;

            min_lon = min_lon.min(node.0);
            max_lon = max_lon.max(node.0);
            min_lat = min_lat.min(node.1);
            max_lat = max_lat.max(node.1);

            boundary.extend_from_slice(&node.0.to_le_bytes()); // TODO: Fancy encoding.
            boundary.extend_from_slice(&node.1.to_le_bytes());
        }

        way_insert_statement.execute(params![min_lon, max_lon, min_lat, max_lat, boundary])?;
    }
    Ok(())
}

Btw it's weird that DenseNode's id is a field, but Node's id is a method.

Thanks for the project!

b-r-u commented 3 years ago

Thanks for this issue and your observations! I will look into it later today. I was already thinking about introducing typed indices but didn't know about typed_index_collections.

b-r-u commented 3 years ago

I tested three different files from Geofabrik (including http://download.geofabrik.de/europe/netherlands-latest.osm.pbf) but could not reproduce this issue. Could you point me to a file that generates this error?

Timmmm commented 3 years ago

Weird, yeah it's this one: http://download.openstreetmap.fr/extracts/europe/netherlands.osm.pbf (1.2 GB)

SHA1 hash:

$ shasum netherlands.osm.pbf
6570f4cc23546d266c0d7f72ce13bd0bffbe3417  netherlands.osm.pbf
b-r-u commented 3 years ago

It looks like this file is inconsistent. There is a way that contains a reference to a node that is not included in the file.

    {
        println!("start checking");
        let reader = ElementReader::from_path("netherlands.osm.pbf")?;
        let needle = 8188406987;

        reader.for_each(
            |element| match element {
                Element::DenseNode(n) => {
                    if n.id == needle {
                        println!("found node {}", needle);
                    }
                },
                Element::Node(n) => {
                    if n.id() == needle {
                        println!("found node {}", needle);
                    }
                },
                Element::Way(w) => {
                    if w.refs().any(|node_id| node_id == needle) {
                        println!("way {} contains node {}", w.id(), needle);
                    }
                },
                _ => {}
            }
        )?;
        println!("done checking");
    }

When adding this code it prints:

start checking
way 5168742 contains node 8188406987
done checking

So the node 8188406987 is missing :(

b-r-u commented 3 years ago

I did some research. It seems to be a common issue that referential integrity is not guaranteed for all OSM extracts:

Timmmm commented 3 years ago

Ah interesting! Thanks for all the help, guess I'll just skip ways with missing refs (or use the file you linked). Not an issue in your library anyway!

Thanks for the great project. I'm using it to make a web server that runs on your phone and serves a map of your location, with the idea being you can connect a kindle to it to use as a satnav when cycling. Phones just don't work when it's sunny.

Got this so far:

img

Needs a bit of work (e.g. projection is wrong, probably should use different styles for different roads and maybe add rivers), but it's probably good enough for satnav if the route is overlaid.

Anyway, thanks again for the helpful library!