Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

integers in jsoncatalog #65

Closed organisciak closed 9 years ago

organisciak commented 9 years ago

variableSet.py has a line that checks if an input is a string or list. This means numbers, which are valid in JSON, are expected to be strings.

Is this by design, relative to to the DB ingest, or an oversight? If integers should be supported, the fix would be:

+++ b/bookworm/variableSet.py
@@ -606,7 +606,7 @@ class variableSet:
                  #Each of these has a different file it must write to...
                 outfile = variable.output
                 lines = entry.get(variable.field, [])
-                if isinstance(lines, basestring):
+                if isinstance(lines, (basestring, int)):

If integers should be quoted as strings, perhaps they should be cast?

bmschmidt commented 9 years ago

It's an oversight. (I don't think we've built one where there was a numeric value that was not distinct for the document, so it's never thrown; is it happening here?).

In the case that it's an int, it will need to be coerced to Unicode before being slotted into the list so that it can be written a few lines down. (Ie, line 618 would have to read "to_unicode(line)" instead of just "line".

I don't know what you mean by "cast" in this context.

organisciak commented 9 years ago

Type casting, exactly what you're suggesting. Java/C language.

On Mon, Mar 23, 2015, 6:15 PM Benjamin Schmidt notifications@github.com wrote:

It's an oversight. (I don't think we've built one where there was a numeric value that was not distinct for the document, so it's never thrown; is it happening here?).

In the case that it's an int, it will need to be coerced to Unicode before being slotted into the list so that it can be written a few lines down. (Ie, line 618 would have to read "to_unicode(line)" instead of just "line".

I don't know what you mean by "cast" in this context.

— Reply to this email directly or view it on GitHub https://github.com/Bookworm-project/BookwormDB/issues/65#issuecomment-85276215 .

bmschmidt commented 9 years ago

Super. Thanks for the catch!

Fixed: https://github.com/Bookworm-project/BookwormDB/commit/574ff7d930fbf25fd7c0b0acfe04ca412aafd691