JuliaGeo / Shapefile.jl

Parsing .shp files in Julia
http://juliageo.org/Shapefile.jl/
MIT License
82 stars 14 forks source link

"ERROR: EOFError: read end of file" on filr that opens without issue using other software #100

Closed alex-s-gardner closed 8 months ago

alex-s-gardner commented 9 months ago

I get a "EOFError: read end of file" error when reading in a shapefile that opens without issue in QGIS. I poked around a bit but could't identify the issue.

using Shapefile

# the shapefiles can be downloaded from NSIDC: https://nsidc.org/data/nsidc-0770/versions/7 
# but you will require an Earth Data Login

# for convenience I've temporarily put the files here
fn = "RGI2000-v7.0-C-02_western_canada_usa"
url2shp =  "https://its-live-data.s3.amazonaws.com/test/$fn.zip"

# download and unzip
Downloads.download(url2shp, "$fn.zip")
run(`unzip $fn`)

# read file
g = Shapefile.Handle("$fn.shp")

ERROR: EOFError: read end of file
Stacktrace:
  [1] unsafe_read(s::IOStream, p::Ptr{UInt8}, nb::UInt64)
    @ Base ./iostream.jl:428
  [2] unsafe_read
    @ ./io.jl:761 [inlined]
  [3] read!
    @ ./io.jl:779 [inlined]
  [4] _readparts
    @ ~/.julia/packages/Shapefile/yESZK/src/utils.jl:7 [inlined]
  [5] read(io::IOStream, #unused#::Type{Shapefile.PolygonZ})
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/polygons.jl:221
  [6] _read_handle_inner(io::IOStream, ::Type{Shapefile.PolygonZ}, header::Shapefile.Header, shapes::Function, index::Nothing; path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:59
  [7] read(io::IOStream, ::Type{Shapefile.Handle}, index::Nothing; path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:45
  [8] read
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:42 [inlined]
  [9] #13
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:20 [inlined]
 [10] open(f::Shapefile.var"#13#14"{String, Nothing}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:395
 [11] open
    @ ./io.jl:392 [inlined]
 [12] Handle
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:19 [inlined]
 [13] Shapefile.Handle(path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:19
 [14] top-level scope
    @ REPL[25]:1
rafaqz commented 9 months ago

Its failing reading the Z column.

Guess: its 2d but says somewhere that its 3d (or we are inferring that wrongly),

So there is no z data (which would come last) but we are trying to read it anyway.

Is it 2d or 3d? What does qgis give you?

(Its possible either that qgis is silently handling a real problem with the file, or that we are misidentifying the file)

alex-s-gardner commented 9 months ago

QGIS loads the shapefile as a MultiPolygonZ

rafaqz commented 9 months ago

Ok well there is a good chance its actually an error in the file, see #54 for discussion.

But it could also be a bug here.

It would be very useful if you could verify that either way.

Does qgis warn, or maybe fill some zs with zeros?

alex-s-gardner commented 9 months ago

A couple more data points:

  1. Looking at the polygon node values the Z coordinate is always zero (not sure if this is in the shapefile or added by QGIS)

  2. When I export as a Polygon shapefile then the file can be read in without issue.

  3. If I export the file from QGIS as a PolygonZ shapefile, I get this new error when using Shapefile.Handle:

    ERROR: ArgumentError: invalid Array dimensions
    Stacktrace:
    [1] Array
    @ ./boot.jl:477 [inlined]
    [2] Array
    @ ./baseext.jl:23 [inlined]
    [3] _partvec
    @ ~/.julia/packages/Shapefile/yESZK/src/utils.jl:1 [inlined]
    [4] _readparts
    @ ~/.julia/packages/Shapefile/yESZK/src/utils.jl:6 [inlined]
    [5] read(io::IOStream, #unused#::Type{Shapefile.PolygonZ})
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/polygons.jl:221
    [6] _read_handle_inner(io::IOStream, ::Type{Shapefile.PolygonZ}, header::Shapefile.Header, shapes::Function, index::Nothing; path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:59
    [7] read(io::IOStream, ::Type{Shapefile.Handle}, index::Nothing; path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:45
    [8] read
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:42 [inlined]
    [9] #13
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:20 [inlined]
    [10] open(f::Shapefile.var"#13#14"{String, Nothing}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:395
    [11] open
    @ ./io.jl:392 [inlined]
    [12] Handle
    @ ~/.julia/packages/Shapefile/yESZK/src/handle.jl:19 [inlined]
    [13] Shapefile.Handle(path::String)
    @ Shapefile ~/.julia/packages/Shapefile/yESZK/src/handle.jl:19
    [14] top-level scope
    @ REPL[1]:1
alex-s-gardner commented 9 months ago

The issue seems to be in the polygon read function

Specifically for this shapefile numparts does not equal numpoints for the very last polygon. This is consistent for all shapefiles contained in this dataset "https://nsidc.org/data/nsidc-0770/versions/7"

Now the question is

  1. Is this a systematic error in the "https://nsidc.org/data/nsidc-0770/versions/7" shapefiles -or-
  2. Is this an error in reading PolygonZ shapefiles with Shapefile.jl
alex-s-gardner commented 9 months ago

Does anyone know of another PolygonZ dataset that I could test on?

rafaqz commented 9 months ago

Hmm numparts doesnt usually equal numpoints? Do you mean there are no points for the specified parts?

Would the last polygon always be empty?

(Ther are z polygons in the test data, have a look at the tests)

rafaqz commented 9 months ago

Maybe also check against the polygon spec

https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf

alex-s-gardner commented 8 months ago

Hmm numparts doesnt usually equal numpoints? Do you mean there are no points for the specified parts?

You are right, numparts != numpoints

alex-s-gardner commented 8 months ago

It is the last feature in the collection that causes an EOFError: read end of file which makes me think that we might be getting an offset in reading of the binary file somehow -or- all of the files are corrupt but read into QGIS without warning.

Here's the PolygonZ spec:

Screenshot 2023-10-05 at 2 01 22 PM
rafaqz commented 8 months ago

Ohhh probably that "optional" star next to the M values is the problem!

We dont actally handle that. I guess we need to check the offsets to see if M should be there or not.

We will need a way to embed that in the type too, like an M type parameter that is true or false

alex-s-gardner commented 8 months ago

Ohhh probably that "optional" star next to the M values is the problem!

That is something that will need to be added but it doesn't seem to be the issue in this case... when I don't read M values and instead populate with filler values I still get the same end-of-file error... I'm giving up for now and chalking it up to an issue with the file itself as zvalues and mvalues are read without issue for all parts excluding the last.