AlertaDengue / PySUS

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
GNU General Public License v3.0
173 stars 68 forks source link

Large datasets splited in multiple files generate a ValueError: not enough values to unpack (expected 2, got 1) #201

Closed anapaulagomes closed 1 month ago

anapaulagomes commented 1 month ago

Some states have a large data volume during some months, and instead of having only one file, it has a few. The first one is the expected format, and the following is _<number>.dbc. I've only tested this for SIA and "Boletins Individualizados." Examples:

from pysus.online_data.SIA import download
download('AC', 2019, 1, groups=["BI"])

# or

from pysus.ftp.databases.sia import SIA
sia = SIA().load()
sia.get_files("BI", uf="AL", year=2019)

Culprit:

BIMG2306.dbc, BIMG2306_1.dbc, BIMG2306_2.dbc

PySUS version: 0.14.1 OS: Mac OS M1 / Python 3.11.9

Maybe this is related to https://github.com/AlertaDengue/PySUS/issues/64. I'd be willing to open a PR with a fix for it; I need a confirmation of what needs to be done here since most format methods follow the same return signature.

anapaulagomes commented 1 month ago

Fixed already but not available in pip.

fccoelho commented 1 month ago

@luabida maybe we need a new minor release

luabida commented 1 month ago

Sorry for that, I thought it had been published already. It should work now

In [3]: [f for f in sia.get_files("BI") if "_" in f.name]
Out[3]:
[BIMG2305_1.dbc,
 BIMG2305_2.dbc,
 BIMG2306_1.dbc,
 BIMG2306_2.dbc,
 BIMG2307_1.dbc,
 BIMG2307_2.dbc,
 BIMG2308_1.dbc,
 BIMG2308_2.dbc,
 BIMG2309_1.dbc,
 BIMG2309_2.dbc,
 BIMG2310_1.dbc,
 BIMG2310_2.dbc,
 BIMG2311_1.dbc,
 BIMG2311_2.dbc,
 BIMG2312_1.dbc,
 ...
 BISP2405_2.dbc]

In [4]: set([sia.format(f)[1] for f in sia.get_files("BI") if "_" in f.name])
Out[4]: {'MG', 'RJ', 'SP'}
anapaulagomes commented 1 month ago

It works, thanks!