CartoDB / crankshaft

CARTO Spatial Analysis extension for PostgreSQL
BSD 3-Clause "New" or "Revised" License
53 stars 20 forks source link

markov fails with some inputs #83

Open javisantana opened 8 years ago

javisantana commented 8 years ago

in some cases, when markov does not like the input raises an exception like:

ERROR:  IOError: [Errno 5] Input/output error
CONTEXT:  Traceback (most recent call last):
  PL/Python function "cdb_spatialmarkovtrend", line 6, in <module>
    return spatial_markov_trend(subquery, time_cols, num_classes, w_type, num_ngbrs, permutations, geom_col, id_col)
  PL/Python function "cdb_spatialmarkovtrend", line 78, in spatial_markov_trend
  PL/Python function "cdb_spatialmarkovtrend", line 440, in __init__
  PL/Python function "cdb_spatialmarkovtrend", line 484, in _calc
PL/Python function "cdb_spatialmarkovtrend"

The IOError is because pysql is doing a print https://github.com/pysal/pysal/blob/master/pysal/spatial_dynamics/markov.py#L485

Looking at pysal code I'm starting to be really afraid because it looks like not really production ready

javisantana commented 8 years ago

after some research i realized the problem is the permutations, setting up to a low value it works.

It'd be nice to know how permutations changes the quality of the analysis since it takes a lot of time when the value is high

andy-esch commented 8 years ago

I'm pretty sure we can safely set it to permutations=0, will take a look now. Some of the derivative statistics are calculated when permutations is greater than zero but I didn't use any of those on this release...

andy-esch commented 8 years ago

Yeah, so permutations=0 is fine here.

On the FMPT singularity IOError, we don't even use fmpt, so could use a fork of their code with the calculation of that removed. In fact, we could write a very slimmed down version of the Spatial_Markov class written as a separate module to avoid calculating a lot of the extraneous class members which are not ever used or output from CDB_SpatialMarkovTrend.

ljwolf commented 8 years ago

Would making some of these attributes properties work for y'all?

Typically, the size of enumerate(T) is small, so iterating more than once through it should be fine with upstream.

I'm thinking of the following patch:

@property
def F(self):
    F = np.zeros_like(self.T)
    try:
        return self._F
    except AttributeError:
        for i, mat in enumerate(self.T):
            row_sum = mat.sum(axis=1)
            row_sum += row_sum == 0
            p_i = np.matrix(np.diag(1.0 / row_Sum) * np.matrix(mat))
            try:
                F[i] = fmpt(p_i)
            except np.LinAlgError as e:
                warnings.Warn('Singular First Mean Passing Time computation for cell {}'.format(i))
        self._F = F
        return self._F
andy-esch commented 8 years ago

Works for me. What about replacing the outer try/except with an if statement on hasattr(self, "_F") instead to avoid having exception logic in the code?

ljwolf commented 8 years ago

Most of the attributes in spatial markov have been made properties, so you should only pay for what you use now.

Also, we're converting print to warnings.warn when encountered, so we cut down on the issues w/ plpython. Most of our user base is not in postgres, so we had no clue that was an issue.