Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

Different field_descriptions.json for in congress tutorial vs. repo #47

Closed organisciak closed 8 years ago

organisciak commented 9 years ago

The tutorial for the Congress example asks you to paste in a custom field_descriptions.json:

[
   {"field":"date","datatype":"time","type":"numeric","unique":true,"derived":[{"resolution":"month"}]},
   {"field":"searchstring","datatype":"searchstring","type":"text","unique":true},
   {"field":"enacted","datatype":"categorical","type":"text","unique":false},
   {"field":"sponsor_state","datatype":"categorical","type":"text","unique":false},
   {"field":"cosponsors_state","datatype":"categorical","type":"text","unique":false},
   {"field":"chamber","datatype":"categorical","type":"text","unique":false}
   ]

However, the repo for congress_api already has a different, larger version:

[
    {"field":"date","datatype":"time","type":"numeric","unique":true,"derived":[{"resolution":"month"}, {"resolution":"year"}]},
    {"field":"searchstring","datatype":"searchstring","type":"text","unique":true},
    {"field":"chamber","datatype":"categorical","type":"text","unique":true},
    {"field":"awaiting_signature","datatype":"categorical","type":"text","unique":true},
    {"field":"enacted","datatype":"categorical","type":"text","unique":true},
    {"field":"vetoed","datatype":"categorical","type":"text","unique":true},
    {"field":"status","datatype":"categorical","type":"text","unique":true},
    {"field":"main_subject","datatype":"categorical","type":"text","unique":true},
    {"field":"subjects","datatype":"categorical","type":"text","unique":false},
    {"field":"sponsor_state","datatype":"categorical","type":"text","unique":true},
    {"field":"sponsor_name","datatype":"categorical","type":"text","unique":true},
    {"field":"sponsor_title","datatype":"categorical","type":"text","unique":true},
    {"field":"num_cosponsors","datatype":"categorical","type":"integer","unique":true},
    {"field":"thomas_id","datatype":"etc","type":"integer","unique":true}
]

If there is a reason for the truncated example in the tutorial, it is not clear. I can imagine one valid reason: to have a person see and consider the format of the file, rather than saying "it's all already in the repo".

The new Bookworm Docs don't seem to use the Congress example anymore, so this may not be too important, but it's perfectly functional, so I think it's worth keeping around.

bmschmidt commented 8 years ago

I've updated the Congress repo example for the v0.4.0 readme.