AlertaDengue / PySUS

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
GNU General Public License v3.0
178 stars 70 forks source link

feat(sinan): add more parsed columns to final dataframe #119

Closed luabida closed 1 year ago

luabida commented 1 year ago

before:

   DT_SIN_PRI
0  20080405
1  20080404
2  20080403

after:

   DT_SIN_PRI
0  2008-04-05
1  2008-04-04
2  2008-04-03
luabida commented 1 year ago

@fccoelho all decoder tests are failing, but I don't see a relation between the column types and the decoders. PR is ready to merge

fccoelho commented 1 year ago

@fccoelho all decoder tests are failing, but I don't see a relation between the column types and the decoders. PR is ready to merge

Could you open a separate issue to fix the decoders?

luabida commented 1 year ago

@fccoelho Could you open a separate issue to fix the decoders?

Yes, can you wait this morning to merge this PR? I'm trying to fix the ValueError: A string literal cannot contain NUL (0x00) characters. error that is still rising when trying to insert with pangres, but I've to pick up Isa at the bus station now. About the NUL char, I've find this issue here, what do you think about replace("\x00", "\uFFFD")?

Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/decorators/base.py", line 179, in execute return_value = super().execute(context) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 171, in execute return_value = self.execute_callable() File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 189, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/opt/airflow/dags/brasil/sinan.py", line 134, in upload raise e File "/opt/airflow/dags/brasil/sinan.py", line 130, in upload loading.upload(disease=disease, parquet_dir=dir) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 52, in upload raise e File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 37, in upload upsert( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/core.py", line 302, in upsert executor.execute(connectable=con, if_row_exists=if_row_exists, chunksize=chunksize) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/executor.py", line 87, in execute pse.upsert(if_row_exists=if_row_exists, chunksize=chunksize) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/engine.py", line 551, in upsert upq.execute(db_type=self._db_type, values=chunk, if_row_exists=if_row_exists) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/upsert_query.py", line 231, in execute return self.connection.execute(query) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection return connection._execute_clauseelement( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement ret = self._execute_context( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context self._handle_dbapi_exception( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2128, in _handle_dbapi_exception util.raise_(exc_info[1], with_traceback=exc_info[2]) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context self.dialect.do_execute( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) ValueError: A string literal cannot contain NUL (0x00) characters.
fccoelho commented 1 year ago

@fccoelho Could you open a separate issue to fix the decoders?

Yes, can you wait this morning to merge this PR? I'm trying to fix the ValueError: A string literal cannot contain NUL (0x00) characters. error that is still rising when trying to insert with pangres, but I've to pick up Isa at the bus station now. About the NUL char, I've find this issue here, what do you think about replace("\x00", "\uFFFD")? Traceback (most recent call last): File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/decorators/base.py", line 179, in execute return_value = super().execute(context) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 171, in execute return_value = self.execute_callable() File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/airflow/operators/python.py", line 189, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/opt/airflow/dags/brasil/sinan.py", line 134, in upload raise e File "/opt/airflow/dags/brasil/sinan.py", line 130, in upload loading.upload(disease=disease, parquet_dir=dir) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 52, in upload raise e File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/epigraphhub/data/brasil/sinan/loading.py", line 37, in upload upsert( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/core.py", line 302, in upsert executor.execute(connectable=con, if_row_exists=if_row_exists, chunksize=chunksize) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/executor.py", line 87, in execute pse.upsert(if_row_exists=if_row_exists, chunksize=chunksize) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/engine.py", line 551, in upsert upq.execute(db_type=self._db_type, values=chunk, if_row_exists=if_row_exists) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/pangres/upsert_query.py", line 231, in execute return self.connection.execute(query) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection return connection._execute_clauseelement( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement ret = self._execute_context( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context self._handle_dbapi_exception( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2128, in _handle_dbapiexception util.raise(exc_info[1], with_traceback=excinfo[2]) File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise raise exception File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context self.dialect.do_execute( File "/opt/conda/envs/epigraphhub/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) ValueError: A string literal cannot contain NUL (0x00) characters.

I think we should just replace with an empty string: replace("\x00", "").

luabida commented 1 year ago

@fccoelho Done

github-actions[bot] commented 1 year ago

:tada: This PR is included in version 0.9.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: