NetTopologySuite / NetTopologySuite.IO.ShapeFile

The ShapeFile IO module for NTS.
33 stars 25 forks source link

API review #42

Closed stijnherreman closed 1 month ago

stijnherreman commented 4 years ago

Hi,

I'm looking to migrate away from an internal shapefile library (very old NTS fork + modifications), to this module. While figuring out how to use the API, I've also reviewed it and found that there is room for improvement. Most notably, IEnumerable<T> should be added to the non-Extended classes, but it's not a quick fix that I can do in a small PR. ShapefileDataReader currently outputs an ICustomTypeDescriptor backed by an internal type, which is not ideal.

A number of things that I would like to improve on:

I should be able to do a good part of the necessary work, but I would like input from the current maintainers before going ahead.

Below is the public API of the read-related classes that I reviewed. I have not (yet) gone over write-related classes because they're not the focus of the project I work on. (The list was compiled by hand, so there may be errors.)

Public API

.dbf

NetTopologySuite.IO.DbaseFileReader

Implements: System.Collections.IEnumerable

DbaseFileReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry)
DbaseFileReader(string path)
DbaseFileReader(string path, System.Text.Encoding encoding)
System.Collections.IEnumerator GetEnumerator()
NetTopologySuite.IO.DbaseFileHeader GetHeader()

NetTopologySuite.IO.ShapeFile.Extended.DBaseReader

Implements: System.IDisposable, System.Collections.Generic.IEnumerable<NetTopologySuite.Features.IAttributesTable>

DbaseReader(NetTopologySuite.IO.Streams.IStreamProvider streamProvider)
DbaseReader(string filename)
void Dispose()
System.Collections.Generic.IEnumerator<NetTopologySuite.Features.IAttributesTable> GetEnumerator()
NetTopologySuite.Features.IAttributesTable ReadEntry(int index)

.shp

NetTopologySuite.IO.ShapefileReader

Implements: System.Collections.IEnumerable

ShapefileReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry shapeStreamProviderRegistry, NetTopologySuite.Geometries.GeometryFactory geometryFactory)
ShapefileReader(string filename)
ShapefileReader(string filename, NetTopologySuite.Geometries.GeometryFactory geometryFactory)
System.Collections.IEnumerator GetEnumerator()
NetTopologySuite.Geometries.GeometryCollection ReadAll()
NetTopologySuite.IO.ShapefileHeader Header { get; }

NetTopologySuite.IO.ShapeFile.Extended.ShapeReader

Implements: System.IDisposable

ShapeReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry)
ShapeReader(string shapefilePath)
void Dispose()
System.Collections.Generic.IEnumerable<NetTopologySuite.Geometries.Geometry> ReadAllShapes(NetTopologySuite.Geometries.GeometryFactory geoFactory)
System.Collections.Generic.IEnumerable<NetTopologySuite.IO.Handlers.MBRInfo> ReadMBRs()
NetTopologySuite.Geometries.Geometry ReadShapeAtIndex(int index, NetTopologySuite.Geometries.GeometryFactory geoFactory)
NetTopologySuite.Geometries.Geometry ReadShapeAtOffset(long shapeOffset, NetTopologySuite.Geometries.GeometryFactory geoFactory)
NetTopologySuite.IO.ShapefileHeader ShapefileHeader { get; }

.dbf + .shp

NetTopologySuite.IO.ShapefileDataReader

Implements: System.Data.IDataReader, System.IDisposable, System.Collections.IEnumerable

ShapefileDataReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry, NetTopologySuite.Geometries.GeometryFactory geometryFactory)
ShapefileDataReader(string filename, NetTopologySuite.Geometries.GeometryFactory geometryFactory)
ShapefileDataReader(string filename, NetTopologySuite.Geometries.GeometryFactory geometryFactory, System.Text.Encoding encoding)
void Close()
void Dispose()
bool GetBoolean(int i)
byte GetByte(int i)
long GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length)
char GetChar(int i)
long GetChars(int i, long fieldoffset, char[] buffer, int bufferoffset, int length)
System.Data.IDataReader GetData(int i)
string GetDataTypeName(int i)
System.DateTime GetDateTime(int i)
decimal GetDecimal(int i)
double GetDouble(int i)
System.Collections.IEnumerator GetEnumerator()
System.Type GetFieldType(int i)
float GetFloat(int i)
System.Guid GetGuid(int i)
short GetInt16(int i)
int GetInt32(int i)
long GetInt64(int i)
string GetName(int i)
int GetOrdinal(string name)
System.Data.DataTable GetSchemaTable()
string GetString(int i)
object GetValue(int i)
int GetValues(object[] values)
bool IsDBNull(int i)
bool NextResult()
bool Read()
void Reset()
NetTopologySuite.IO.DbaseFileHeader DbaseHeader { get; }
int Depth { get; }
int FieldCount { get; }
NetTopologySuite.Geometries.Geometry Geometry { get; }
bool IsClosed { get; }
int RecordCount { get; }
int RecordsAffected { get; }
NetTopologySuite.IO.ShapefileHeader ShapeHeader { get; }
object this[int i] { get; }
object this[string name] { get; }

NetTopologySuite.IO.ShapeFile.Extended.ShapeDataReader

Implements: System.IDisposable

ShapeDataReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry)
ShapeDataReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index)
ShapeDataReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index, NetTopologySuite.Geometries.GeometryFactory geoFactory)
ShapeDataReader(NetTopologySuite.IO.Streams.IStreamProviderRegistry streamProviderRegistry, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index, NetTopologySuite.Geometries.GeometryFactory geoFactory, bool buildIndexAsync)
ShapeDataReader(string shapeFilePath)
ShapeDataReader(string shapeFilePath, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index)
ShapeDataReader(string shapeFilePath, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index, NetTopologySuite.Geometries.GeometryFactory geoFactory)
ShapeDataReader(string shapeFilePath, NetTopologySuite.Index.ISpatialIndex<NetTopologySuite.IO.Handlers.ShapeLocationInFileInfo> index, NetTopologySuite.Geometries.GeometryFactory geoFactory, bool buildIndexAsync)
void Dispose()
System.Collections.Generic.IEnumerable<NetTopologySuite.IO.ShapeFile.Extended.Entities.IShapefileFeature> ReadByMBRFilter(NetTopologySuite.Geometries.Envelope envelope, [bool testGeometriesActuallyInMBR = False])
NetTopologySuite.Geometries.Envelope ShapefileBounds { get; }
FObermaier commented 4 years ago

IMHO you are right that the whole shapefile related API is opaque and redundant. A cleanup/new lean API is appreciated, what do you think @airbreather.

airbreather commented 4 years ago

I totally agree that the API today is horrendous. Not only is it hard to use, but it's also slow and can't really be made to be fast.

In one internal project I've been working on, the reader code was usable at a prototyping stage, but it very quickly became a huge bottleneck to the point that I wound up reimplementing just the parts that we needed.

Even just one feature, DBF encodings, seems to barely work when you actually try to use it.

IMO, the shapefile project is in desperate need of a rewrite, starting from just the minimum needed to implement fast and robust support for reading / writing SHP + SHX and (separately) DBF, and then add layers on top of it to support things like IEnumerable<T> and IDataReader.

I can come up with some outlines for how I think this should look, balancing performance and usability.

airbreather commented 4 years ago

remove IDataReader implementation, doesn't really make sense and some methods have nonsense implementations (e.g. return 0; no matter the method arguments)

To this specific point, IDataReader is a very... comprehensive... interface that supports a lot more than the version of the dBASE format that we implement.

I do endorse exposing an IDataReader implementation, as it (in theory) allows someone to connect their shapefile data source to any of a number of external APIs that use ADO.NET to model their data access.

That said, I do not endorse having that IDataReader implementation be the primary method for reading data from shapefiles as appears to be the case right now.

xivk commented 4 years ago

I agree with @airbreather except for keeping IDataReader but I can live with that if it's not the primary method for reading data as suggested. Also related is this issue https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/issues/2

airbreather commented 4 years ago

I can come up with some outlines for how I think this should look, balancing performance and usability.

I've made a basic start over here: https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/wiki/Ideas-for-new-API-(Reading)

So far, I've just jotted what's been on my mind for reading, since I've been thinking about it a lot recently.

Note that, whereas writing is much more straightforward than reading for most formats, with shapefiles, it's kind-of the opposite. The format makes quite a few tradeoffs to help make it easier to develop an efficient reader, at the expense of making it impossible to develop an efficient forward-only streaming writer unless you have significant a priori knowledge of the entire dataset's contents.

Long story short... writing is more complicated. Even finding the right trade-offs is probably going to be a nontrivial process:

airbreather commented 4 years ago

I've sorta got a start for the writing side of things here: https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/wiki/Ideas-for-new-API-(Writing)

It is a tough challenge to balance efficiency, ease of use, and correctness for this, so especially on the writer side, I'd appreciate feedback on some of the ideas over there...

KubaSzostak commented 3 years ago

I propose to have only one Header - ShapefileHeader which should deliver from DbaseFileHeader. This new Header should have first field assigned to SHAPE column as it is already done internally.

KubaSzostak commented 1 month ago

Most of the issues were resolved by #48.