CartoDB / crankshaft

CARTO Spatial Analysis extension for PostgreSQL
BSD 3-Clause "New" or "Revised" License
54 stars 20 forks source link

Pysal spint - work in progress #147

Open TaylorOshan opened 7 years ago

TaylorOshan commented 7 years ago

The purpose of this PR is to track progress towards integrating the SpInt module of pysal into crankshaft. So far a first pass have been completed for the gravity class and the production class, which still leaves the attraction class and the doubly class.

TaylorOshan commented 7 years ago

@andy-esch I have the 4 main classes sorted out, but I still need to implement classes to carry out the "local" versions of gravity, production, and attraction classes. I think these will need to be separate classes since they have different (more) output. I think I should be able to finish by the end of tomorrow, and then I think the SpInt stuff will be ready to go.

TaylorOshan commented 7 years ago

Ok, I've finished adding classes for all four of the global spatial interaction models (gravity, production, attraction, doubly) and 3 additional classes for local spatial interaction models (local_gravity, local_production, and local_attraction), so this should be ready to go.

The local production and attraction models do not have any additional inputs because the production constrained must be focused on flows from origin and the attraction constrained must be focused on flows to destinations. However, the local gravity model can be focused on either origins or destinations, so there is an additional input that is the either the origin or the destination id's.

andy-esch commented 7 years ago

Hey @TaylorOshan, running through the code, it looks like the two local functions do not work from d41112a onward. The error I'm getting is happening in line 656 of gravity.py (similar for the local versions of Production and Attraction) with this piece of code:

offset = len(np.unique(locs))

It turns out that in NumPy v1.6.1, which is what crankshaft is using right now, np.unique(None) throws a TypeError. v1.12.0 returns array([None]). I wonder if in your docker install the numpy version differed?

Since the offset value used in these class methods is calculated from len(np.unique(locs)) (where locs defaults to None), it's erroring when I run these functions in development.

I'm going to patch it by setting the offset to 1 (not zero, right?) if locs is None, and len(np.unique(locs)) otherwise. How does that sound?

TaylorOshan commented 7 years ago

Taking a closer look at this right now!

On Wed, Jan 25, 2017 at 7:02 PM, Andy Eschbacher notifications@github.com wrote:

Hey @TaylorOshan https://github.com/TaylorOshan, running through the code, it looks like all three of the local functions do not work from d41112a https://github.com/CartoDB/crankshaft/commit/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43 onward. The error I'm getting is happening in line 656 of gravity.py https://github.com/TaylorOshan/crankshaft/blob/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43/src/py/crankshaft/crankshaft/regression/spint/base/gravity.py#L656 (similar for the local versions of Production and Attraction) with this piece of code:

offset = len(np.unique(locs))

It turns out that in NumPy v1.6.1, which is what crankshaft is using https://github.com/CartoDB/crankshaft/blob/develop/src/py/crankshaft/requirements.txt#L2 right now, np.unique(None) throws a TypeError. v1.12.0 returns array([None]). I wonder if in your docker install the numpy version differed?

Since the offset value used in these class methods is calculated from len(np.unique(locs)) (where locs defaults to None), it's erroring when I run these functions in development.

I'm going to patch it by setting the offset to 1 (not zero, right?) if locs is none, and len(np.unique(locs)) otherwise. How does that sound?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/crankshaft/pull/147#issuecomment-275290266, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzYx8anORWN_REjp_6wlPB99VEj-sks5rV_6ygaJpZM4LAcWj .

TaylorOshan commented 7 years ago

Ok, so it shouldn't be 0 or 1, since None suggests we use all the locations which will always be > 1. Basically, if n = len(np.unique(locs)) then there will be n fixed effects (constraints) and we want to always get the parameter estimates after that and that's what offset is used for.

And just realizing one chance is needed. Since, I am already doing something similar to this later in the method with these lines https://github.com/TaylorOshan/crankshaft/blob/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43/src/py/crankshaft/crankshaft/regression/spint/base/gravity.py#L671-L672 then I think this can most succinctly be solved by something like:

if locs is None: locs = np.unique(self.o) offset = len(locs) else: offset = len(np.unique(locs))

at the top of the method.

Does that make sense?

On Thu, Jan 26, 2017 at 1:44 PM, Taylor Oshan tayoshan@gmail.com wrote:

Taking a closer look at this right now!

On Wed, Jan 25, 2017 at 7:02 PM, Andy Eschbacher <notifications@github.com

wrote:

Hey @TaylorOshan https://github.com/TaylorOshan, running through the code, it looks like all three of the local functions do not work from d41112a https://github.com/CartoDB/crankshaft/commit/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43 onward. The error I'm getting is happening in line 656 of gravity.py https://github.com/TaylorOshan/crankshaft/blob/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43/src/py/crankshaft/crankshaft/regression/spint/base/gravity.py#L656 (similar for the local versions of Production and Attraction) with this piece of code:

offset = len(np.unique(locs))

It turns out that in NumPy v1.6.1, which is what crankshaft is using https://github.com/CartoDB/crankshaft/blob/develop/src/py/crankshaft/requirements.txt#L2 right now, np.unique(None) throws a TypeError. v1.12.0 returns array([None]). I wonder if in your docker install the numpy version differed?

Since the offset value used in these class methods is calculated from len(np.unique(locs)) (where locs defaults to None), it's erroring when I run these functions in development.

I'm going to patch it by setting the offset to 1 (not zero, right?) if locs is none, and len(np.unique(locs)) otherwise. How does that sound?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/crankshaft/pull/147#issuecomment-275290266, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzYx8anORWN_REjp_6wlPB99VEj-sks5rV_6ygaJpZM4LAcWj .

TaylorOshan commented 7 years ago

And this would be just for production and attraction and not for gravity

On Thu, Jan 26, 2017 at 2:05 PM, Taylor Oshan tayoshan@gmail.com wrote:

Ok, so it shouldn't be 0 or 1, since None suggests we use all the locations which will always be > 1. Basically, if n = len(np.unique(locs)) then there will be n fixed effects (constraints) and we want to always get the parameter estimates after that and that's what offset is used for.

And just realizing one chance is needed. Since, I am already doing something similar to this later in the method with these lines https://github.com/TaylorOshan/crankshaft/blob/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43/src/py/crankshaft/crankshaft/regression/spint/base/gravity.py#L671-L672 then I think this can most succinctly be solved by something like:

if locs is None: locs = np.unique(self.o) offset = len(locs) else: offset = len(np.unique(locs))

at the top of the method.

Does that make sense?

On Thu, Jan 26, 2017 at 1:44 PM, Taylor Oshan tayoshan@gmail.com wrote:

Taking a closer look at this right now!

On Wed, Jan 25, 2017 at 7:02 PM, Andy Eschbacher < notifications@github.com> wrote:

Hey @TaylorOshan https://github.com/TaylorOshan, running through the code, it looks like all three of the local functions do not work from d41112a https://github.com/CartoDB/crankshaft/commit/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43 onward. The error I'm getting is happening in line 656 of gravity.py https://github.com/TaylorOshan/crankshaft/blob/d41112ae1e697ea3b6b5b5b1f1beeb0c3157ec43/src/py/crankshaft/crankshaft/regression/spint/base/gravity.py#L656 (similar for the local versions of Production and Attraction) with this piece of code:

offset = len(np.unique(locs))

It turns out that in NumPy v1.6.1, which is what crankshaft is using https://github.com/CartoDB/crankshaft/blob/develop/src/py/crankshaft/requirements.txt#L2 right now, np.unique(None) throws a TypeError. v1.12.0 returns array([None]). I wonder if in your docker install the numpy version differed?

Since the offset value used in these class methods is calculated from len(np.unique(locs)) (where locs defaults to None), it's erroring when I run these functions in development.

I'm going to patch it by setting the offset to 1 (not zero, right?) if locs is none, and len(np.unique(locs)) otherwise. How does that sound?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/crankshaft/pull/147#issuecomment-275290266, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzYx8anORWN_REjp_6wlPB99VEj-sks5rV_6ygaJpZM4LAcWj .

andy-esch commented 7 years ago

Nice -- I'll add that :) Thanks!

TaylorOshan commented 7 years ago

Sounds good! I think I need to update my tests to include the scenario that locs!=None. I think I always test with locs=None because its the most probably and simplest use case.

On Thu, Jan 26, 2017 at 2:10 PM, Andy Eschbacher notifications@github.com wrote:

Nice -- I'll add that :) Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/crankshaft/pull/147#issuecomment-275514748, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzZ3cgVcUB2MKN4-fwReNu1rhX4xEks5rWQvFgaJpZM4LAcWj .

andy-esch commented 7 years ago

I'm seeing IndexError: index out of bounds now for lines 690 and 895 of /src/py/crankshaft/crankshaft/regression/spint/base/gravity.py in 14be993.

TaylorOshan commented 7 years ago

Ok, so apologies on this, I was way too hasty and think I confused myself between the "global" spatial interaction models and these local versions. You were totally right and this offset is always 1. For the local models, each individual model uses the flows from one origin (destination) to all destinations (origins) and therefore the origin (destination) specific fixed effect always has one level and it is the coefficient for this level that we are skipping by using the offset. Setting offset to 1 is making all of my tests pass, which include scenarios where locs != None.

I think the statement:

if locs is None: locs = np.unique(self.o)

can be moved back to its original placement or it can stay at the top, it shouldn't make a difference.

On Thu, Jan 26, 2017 at 3:08 PM, Andy Eschbacher notifications@github.com wrote:

I'm seeing IndexError: index out of bounds now for lines 690 and 895 in 14be993 https://github.com/CartoDB/crankshaft/commit/14be9935a812ddbb84dca1321017ce5a35073de5 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/crankshaft/pull/147#issuecomment-275529671, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzZCUV19-XM9HBD0DVMLMmFx1PX_7ks5rWRlpgaJpZM4LAcWj .

andy-esch commented 7 years ago

Great, just fixed it:

offset = 1
if locs is None:
    locs = np.unique(self.o)  # for Production; swap in self.d for the Attraction case