amandasaurus / osmio

Read & write OSM file formats
Apache License 2.0
19 stars 8 forks source link

fails to parse overpass-turbo export due to BOM #7

Open michaelkirk opened 3 years ago

michaelkirk commented 3 years ago

e.g. the default overpass-turbo script:

/*
This is an example Overpass query.
Try it out by pressing the Run button above!
You can find more examples with the Load tool.
*/
node
  [amenity=drinking_water]
  ({{bbox}});
out;
  1. click run
  2. click export
  3. then download/copy as raw OSM data

You'll get a file like this (though unzipped): overpass-export-bom.osm.gz

When I try to process it, the osmio parser explodes with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 1:1, kind: Syntax("Unexpected characters outside the root element: \u{feff}") }', /Users/mkirk/src/georust/osmio/src/xml/mod.rs:65:25
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
   2: core::option::expect_none_failed
             at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
   3: core::result::Result<T,E>::unwrap
             at /Users/mkirk/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1037:23
   4: <osmio::xml::XMLReader<R> as osmio::OSMReader>::next
             at /Users/mkirk/src/georust/osmio/src/xml/mod.rs:65:22
   5: <osmio::OSMObjectIterator<R> as core::iter::traits::iterator::Iterator>::next
             at /Users/mkirk/src/georust/osmio/src/lib.rs:505:9
   6: osm2fgb::convert_xml
             at ./src/main.rs:63:20
   7: osm2fgb::main
             at ./src/main.rs:46:5
   8: core::ops::function::FnOnce::call_once
             at /Users/mkirk/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5

If I open in vim, run :set nobomb and save it as: overpass-export-nobom.osm.gz, then osmio can successfully process the input.

michaelkirk commented 3 years ago

Maybe a dupe of https://github.com/netvl/xml-rs/issues/155

michaelkirk commented 3 years ago

So it seems that this is a known issue - xml_rs explodes when encountering a BOM.

The author prefers that, since the BOM exists "outside of the xml", this should be fixed by every user of the crate, to be sure any BOM is stripped before handing input to the xml_rs crate.

Their suggestion is to build something like:

use bom_remover::BomRemover;

let file = File::open("file.xml").unwrap();
let file = BufReader::new(file):
let file = BomRemover::new(file);
let reader = EventReader::new(file);