bilaldursun1 / nettopologysuite

Automatically exported from code.google.com/p/nettopologysuite
0 stars 0 forks source link

Improving ShapefileReader #177

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm not pleased with the implementations of ShapefileDataReader, 
ShapefileReader, DbaseFileReader since they are not efficient with large 
shapefiles. They don't support reading geometries on a random access base.

I'm going to improve those classes so they'll first make a spatial index from 
the MBR of the geometries (and not waste time parsing the whole geometry). In 
this way I can get geometries that intersect in an arbitrary envelope more 
efficiently than iterating through all of them.

I hope you'll agree to adopt my improvements when they'll be ready and guide me 
through your standards.

Original issue reported on code.google.com by ronen.sc...@gmail.com on 9 Mar 2014 at 7:04

GoogleCodeExporter commented 9 years ago
we hope to review your patch soon :)

Original comment by diegogu...@gmail.com on 10 Mar 2014 at 1:04

GoogleCodeExporter commented 9 years ago
Hi.

The attached .RAR archive contains the changes we made to the shape reader 
classes.
The solution contains 2 projects: the main one contains files we changed, along 
with some other classes that are marked internal so we had to copy them to our 
project.
The other project contains unit tests we wrote to test the added functionality 
we added.

There could be some tests that are duplicated with your existing tests, as we 
don't use NUnit and couldn't run your tests.

Some main changes we made:
* We implemented spatial queries on Shapefiles, using the SRTree implementaion 
present in NTS.
* We Implemented the returned type IFeature as a lazy implementation, meaning 
that both the geometric data and the metadata (from the dbf) are read in a lazy 
manner.
* We fixed a bug in the PointHandler class - We encountered a Shapefile of type 
PointMZ where there wasn't really an M coordinate, and the shape size 
represented that lack of a coordinate.
The old code just read the data according to the type, rather than checking the 
number of read bytes, we fixed that.
* Added the ability to read shapes and metadata (from dbf) in a random access 
manner - by index.
* Added the ability to use custom SpatialIndex in the ShapeDataReader class.
* Added the ability to determine whether the indexing in the ShapeDataReader 
class would be performed syncronously, thereby blocking the constructor method 
until its done, or asyncronously in a Task.

We hope you accept our patch, and if not we'd appreciate any constructive 
criticism or comments.

Original comment by ronen.sc...@gmail.com on 2 Apr 2014 at 12:07

Attachments:

GoogleCodeExporter commented 9 years ago
I read my comment after posting and realized it lacked a short description of 
the classes we changed:

*ShapeReader - A class used to read geometric data from .SHP files.
Able to provide an enumerator for reading the actual shape data, or for reading 
the MBRs of the shapes.
Also able to read in a random access manner - by index.

* DBaseFileReader - A class used to read metadata from .DBF files.
Able to provide an enumerator for reading the rows sequentialy.
Also able to read by index.

ShapeDataReader - A class that reads features.
The return value is of type IFeature, to enable easy type change.
This class uses the 2 formerly mentioned classes, and offer the functionality 
of reading all features in given MBR, using a SpatialIndex for performance.

It accepts in its constructor an instance of ISpatialIndex in case the user 
wants to use their own implementaion of a spatial index, or uses SRTreeIndex by 
default.

In addition, we created the ShapefileFeature class which implements the 
IFeature interface in a lazy manner.

The other classes we mostly edited to fit our needs.

Original comment by ronen.sc...@gmail.com on 2 Apr 2014 at 12:19

GoogleCodeExporter commented 9 years ago
thanks for submitting this code.
unfortunately, to me is very hard to detect the changes to the code and then 
review/apply your modifies.
can you build a svn patch, at least for the core code? 
http://ariejan.net/2007/07/03/how-to-create-and-apply-a-patch-with-subversion/

ShapeDataReader looks having the same purpose of ShapefileDataReader, or I'm 
wrong?

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 7:18

GoogleCodeExporter commented 9 years ago
another susgestion: when you build the patch file, resist to the "natural 
force" (at least to me) to apply your own code style, and try to modify fewer 
lines as possible.
this greatly simplify the patch review.
thanks.

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 7:20

GoogleCodeExporter commented 9 years ago
aniway, I'm trying to integrate your changes as a separate project, merging 
only the changes actually mandatory. hope you can review my code soon.

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 7:53

GoogleCodeExporter commented 9 years ago
almost applied your code in a separate project, only some tests fail

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 9:22

GoogleCodeExporter commented 9 years ago
Applied patch with r1200
I've created a separate project named ShapeFile.Extended, so anyone can check 
the changes and help me evaluate if this code can replace old code.
Ronen: can you review the code and let me see how I missed?

NOTE: I've not applied your fix for bug in the PointHandler class because broke 
other tests: can you submit the specified fix as a svn patch, so I can 
understand better what are the changes involved?

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 12:28

GoogleCodeExporter commented 9 years ago
Another note: some tests are failing, due to the lack of the PointHandler 
patch, I think

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 12:33

GoogleCodeExporter commented 9 years ago
Hi, Im part of Ronen's team.
Which tests fail?
The ones we added or the existing ones?

Thanks for the guidance, we'll attend to the issues in the next few days.

Original comment by avraha...@gmail.com on 3 Apr 2014 at 3:49

GoogleCodeExporter commented 9 years ago

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 4:06

GoogleCodeExporter commented 9 years ago
>The ones we added or the existing ones?
The one you added, but without the PointZM patch
If I apply the patch, some existing tests fail, but I'm not sure to applied the 
patch correctly

Original comment by diegogu...@gmail.com on 3 Apr 2014 at 4:08

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I downloaded your trunc and made the changes needed for our tests to pass.
I also added the patch to the PointHandler and a test that fails if the patch 
is not done.
In addition I made a slight tweak to the ReadMBRs method, specifically with the 
file stream management.

As the link you provided 
(http://ariejan.net/2007/07/03/how-to-create-and-apply-a-patch-with-subversion/)
 instructs, I made a path file which is added to this comment.
I also added the additional shp file needed for the added test.

Original comment by avraha...@gmail.com on 3 Apr 2014 at 6:37

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
A note: I've removed IFeature and let ShapefileFeature extends Feature, because 
I wanna limit the changes to NTS core code, and looks that ShapefileFeature 
introduce only the concept of "Id" that maybe isn't valid for generic feature

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 7:38

GoogleCodeExporter commented 9 years ago
@diegoguidi, @avrahams1, @ronen.schaffer,
we will be adding concept of IFeature in GeoAPI. When we do that, 
implementation of NTS Feature class will have to change, too. There is no 
Features namespace in JTS anyway, so we might as well factor it out into 
NetTopologySuite.Features project.

Original comment by felix.ob...@netcologne.de on 4 Apr 2014 at 7:45

GoogleCodeExporter commented 9 years ago
+1 for adding IFeature, looks as an easily improvement to me actually.
anyway, IFeature probably should not have FeatureId property, so we should have 
(for the purpose of this issue) also a IShapefileFeature, probably.

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 7:52

GoogleCodeExporter commented 9 years ago
Applied patch with r1202. all tests (the newer you submitted and the older in 
the codebase) are green. thanks for subbmitting this code :)

I leave the issue opened to suggestions about how to change/improve the code 
and how to do with new reader (replace the old one? leave both as is? move to 
other projects?)

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 12:08

GoogleCodeExporter commented 9 years ago
Maybe the FeatureId was too specific to our needs, but why not use the IFeature?

It makes it much easier to use Mocking testing and replace the implementation 
if needed.

I wouldn't mind if you choose to call it IShapefileFeature and add the ID 
property.

In my opinion our reader can replace the existing one, as we added in the 
ShapeReader implementation the enumerator that was available in the old version.

Thanks for your guidance, and we hope to continue being part of the NTS 
community.

Original comment by avraha...@gmail.com on 4 Apr 2014 at 12:30

GoogleCodeExporter commented 9 years ago
>why not use the IFeature
to me, we need an IFeature. simply we need to adapt the entire codebase to use 
this interface.

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 12:45

GoogleCodeExporter commented 9 years ago
>to me, we need an IFeature. simply we need to adapt the entire codebase to use 
this interface.
What needs to be adapted?
Its some refactoring, but there isn't any changes to the logic that need to be 
made.

About the FeatureID - We need this property for caching purposes, and I imagine 
we're not that only ones who need that kind of thing.
Any chance you're going to add an IShapefileFeature interface that contains the 
ID property?
You can take the already existing logic in the original code we've uploaded.

Original comment by avraha...@gmail.com on 4 Apr 2014 at 12:52

GoogleCodeExporter commented 9 years ago
>What needs to be adapted?
I think to serialization, geojson stuff, and so on. but looks a little work to 
do.
>Any chance you're going to add an IShapefileFeature interface that contains 
the ID property?
absolutely yes, +1 for IShapefileFeature 

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 12:59

GoogleCodeExporter commented 9 years ago
I made the changes with the new interfaces on the updated version, tests still 
pass.

I changed the normal Feature class to implement an IFeature interface, and 
created an IShapefileFeature interface in out extension project, which 
implements IFeature and adds an ID property.
The new ShapefileDataReader now returns an IShapefileFeature, and our 
ShapefileFeature doesn't implement the Feature class anymore.

In addition there was a test that kept failing because it was written in a 
timezone of utc+2 and the writer wrote the string hard-coded, and I live in a 
UTC+3 timezone so the strings didn't match.

Please let me know that you think about the changes.

Original comment by avraha...@gmail.com on 4 Apr 2014 at 1:44

Attachments:

GoogleCodeExporter commented 9 years ago
applied patch with some small changes. all looks ok, thanks again.

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 2:40

GoogleCodeExporter commented 9 years ago
applied ptch wirh r1203

Original comment by diegogu...@gmail.com on 4 Apr 2014 at 2:41

GoogleCodeExporter commented 9 years ago
Just one last question:
When is the release of the next version planned?

We want to merge these changes into our code, but we'd rather not take the 
development version from SVN.

Original comment by avraha...@gmail.com on 5 Apr 2014 at 7:10

GoogleCodeExporter commented 9 years ago
As I said, I'd like to move IFeature to GeoAPI, therefore I think we have to 
evaluate if and how it fits with other aproaches like e.g. the one used for 
different kinds of SharpMap.

Original comment by felix.ob...@netcologne.de on 7 Apr 2014 at 6:36

GoogleCodeExporter commented 9 years ago
>When is the release of the next version planned?
no plans at this time.
I don't know well nuget, but you can bould your own specific packages from the 
trunk via msbuild (msbuild /t:nugetpack teamcity.targets) 

Original comment by diegogu...@gmail.com on 7 Apr 2014 at 7:38

GoogleCodeExporter commented 9 years ago
Is there an interest in including ShapeFile spatial index to the codebase?
see: https://github.com/fobermaier/sbnsharp

Original comment by felix.ob...@netcologne.de on 21 May 2014 at 2:19

GoogleCodeExporter commented 9 years ago
Let me answer to your question with another question: it's maybe time to think 
about how to integrate shapefile code from/to NTS and SharpMap?
It's something possible, or that makes sense to you?

Original comment by diegogu...@gmail.com on 21 May 2014 at 2:49

GoogleCodeExporter commented 9 years ago
For SharpMap 3.0 where we will be working with IFeature this is definatly the 
way to go

Original comment by felix.ob...@netcologne.de on 21 May 2014 at 2:56

GoogleCodeExporter commented 9 years ago
Anyone for a nuget package

Original comment by felix.ob...@netcologne.de on 13 Aug 2014 at 4:07