epogrebnyak / data-rosstat-kep

Time series dataset of Rosstat Short-term Economic Indicators ("KEP") publication
http://www.gks.ru/wps/wcm/connect/rosstat_main/rosstat/ru/statistics/publications/catalog/doc_1140080765391
6 stars 6 forks source link

var_names.py #52

Closed epogrebnyak closed 8 years ago

epogrebnyak commented 8 years ago
epogrebnyak commented 8 years ago

Running python -m kep.selector.var_names will generate output\varnames.md, that contains missing variable names in second column. Need to find out why, demonstrate in code.

| PROD_TRANS_rog | <...> | в % к предыдущему периоду | | PROD_TRANS_rytd | <...> | <...> | | PROD_TRANS_yoy | <...> | в % к аналог. периоду предыдущего года | | PROD_rog | <...> | в % к предыдущему периоду | | PROD_rytd | <...> | <...> | | PROD_yoy | <...> | в % к аналог. периоду предыдущего года |

epogrebnyak commented 8 years ago
# Some prototypes
# no lambda functions, no names like 'xx', not three-string long expressions
def width_dict(table):
    pass

TABLE_HEADER = #...
ROW2_CONTENT = ":-" * 3

def in_pipes(values, widths):
     pass

part1 = in_pipes(TABLE_HEADER)
part2 = in_pipes(ROW2_CONTENT)
part3 = ...

# Needs refactoring -----------------------------

def get_max_widths(table):
    """Returns a list of maximum lenghts of variable names, text descriptions and unit of measurements."""
    xx = [[len(value) for value in row] for row in table]
    max_widths = [max(xx, key = lambda x: x[i])[i] for i in range(len(xx[0]))]
    return max_widths

def pure_tabulate(table, header = TABLE_HEADER):
    # must pass test_pure_tabulate() below
    width = get_max_widths(table)
    width_dict = {'width{}'.format(i):width[i] for i in range(len(width))} 
    part1 = ("| "  + '{:<{width0}}' + " | "  + '{:<{width1}}' + " | "  + '{:<{width2}}'  + " |\n").format('Код','Описание','Ед.изм.',**width_dict)
    part2 = ("|:-" + '{:-<{width0}}' + "|:-" + '{:-<{width1}}' + "|:-" + '{:-<{width2}}' +  "|\n").format('','','', **width_dict) 
    part3 = "\n".join([("| " + '{:<{width0}}' + " | " + '{:<{width1}}' + " | " + '{:<{width2}}' + " |").format(vn,desc,unit,**width_dict) for vn, desc, unit in table])
    return part1 + part2 + part3

def test_pure_tabulate():
    import tabulate
    table = get_var_list_components() 
    assert pure_tabulate(table, TABLE_HEADER) == tabulate.tabulate(table, TABLE_HEADER, tablefmt="pipe")

# End of refactoring -----------------------------
alexanderlukanin13 commented 8 years ago

@epogrebnyak Should I also fix <...>?

epogrebnyak commented 8 years ago

@alexanderlukanin13 -[ ] Move https://github.com/epogrebnyak/rosstat-kep-data/blob/master/kep/query/var_names.py#L101-L150 and line 91 (with constant) to new file file_io.tabulate.py -[ ] tests associated with file_io.tabulate.py should go to tests/test_file_io_tabulate.py -[ ] write a test that checks there is no 'FILLER' in second column - in new file tests/test_query_var_names.py (this test will likeli fail now) -[ ] move other tests/assert statements from var_names.py to tests/test_query_var_names.py

alexanderlukanin13 commented 8 years ago

@epogrebnyak How to reproduce <...> problem? I run python -m kep.query.var_names, but I only see this:

| Код       | Описание                      | Ед.изм.                                |
|:----------|:------------------------------|:---------------------------------------|
| I_bln_rub | Инвестиции в основной капитал | млрд. руб.                             |
| I_rog     | Инвестиции в основной капитал | в % к предыдущему периоду              |
| I_yoy     | Инвестиции в основной капитал | в % к аналог. периоду предыдущего года |
epogrebnyak commented 8 years ago

Let's consider <...> problem solved, I do not see it in bigger table neither.

epogrebnyak commented 8 years ago

You may still have a test for that, but it would pass then.