Bookworm-project / BookwormAPI

An API implementing a grammar for text analysis
MIT License
13 stars 1 forks source link

"Integer" fields return floats #14

Open organisciak opened 9 years ago

organisciak commented 9 years ago

When I use "method: returnPossibleFields", it tells me that date_year is type: integer. However:

{"date_year":{"$gte":0}}

{"date_year":{"$lte":2015}}

bmschmidt commented 9 years ago

OK, can't fix this right away, but a couple notes:

Minimal example may be that "date_year":{"$gte":1} also returns with ".0" at the end.

The code that's supposed to be correcting this is here: quoted,

        def fixNumpyType(input):
            #This is, weirdly, an occasional problem but not a constant one.
            if str(input.dtype)=="int64":
                return int(input)
            else:
                return input

I don't understand the internal types that numpy uses, but I suspect the problem is that we need to be coercing not just int64, but some of numpy's 11 (!) other integer types to a standard int class before letting the json parser have at them. Maybe when the first number is zero, numpy gets different ideas what to do about a field.

organisciak commented 9 years ago

issubclass(input.dtype.type, np.integer) seems to work.

import numpy as np

def printtype(a):
    print a.dtype
    print "\tINT?", issubclass(a.dtype.type, np.integer)
    print "\tFLOAT?", issubclass(a.dtype.type, np.float)

printtype(np.array([1,2,3]))
printtype(np.array([1,2,3], dtype="uint64"))
printtype(np.array([1.0,2,3]))

Output:

int32
        INT? True
        FLOAT? False
uint64
        INT? True
        FLOAT? False
float64
        INT? False
        FLOAT? True
organisciak commented 9 years ago

I added a test folder on bookworm.htrc.illinois.edu (simply add test/ after cgi-bin/) and tried it out. It looks like fixNumpyType isn't even being called, though.

bmschmidt commented 9 years ago

Yeah, looks like I was wrong. When all numbers are coerced to int with the following code (which is unacceptable, because WordsPerMillion is a float), it does return ints. (Adding in the call to a few other places). So apparently either numpy or mysqldb is being a little finicky about when to return floats, and when ints. (Or, I suppose, it's even possible that some of these are strings; that seems pretty unlikely, but not much more than floats).

        def fixNumpyType(input):
            #This is, weirdly, an occasional problem but not a constant one.                                                                                                      
            if issubclass(input.dtype.type, np.integer):
                return int(input)
            else:
                return int(input)