AlertaDengue / PySUS

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
GNU General Public License v3.0
175 stars 68 forks source link

perf(sinan): remove unnecessary cwd's in FTP_SINAN #123

Closed luabida closed 1 year ago

luabida commented 1 year ago

listing files in ftp server is very dependent on server connection. On tests with airflow, it showed inconstancy. Therefore, this PR tries to reduce the amount of server requests in the FTP_SINAN class

luabida commented 1 year ago

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:40:51  Samples:  1930
 /_//_/// /_\ / //_// / //_'/ //     Duration: 19.230    CPU time: 2.457
/   _/                      v4.4.0

Program: /home/luabida/Projetos/EGH/EpiGraphHub/containers/airflow/dags/brasil/sinan.py

19.227 <module>  sinan.py:1
├─ 18.069 task_flow_for  sinan.py:59
│  ├─ 17.605 FTP_SINAN.__init__  pysus/online_data/__init__.py:650
│  │  └─ 17.605 FTP.nlst  ftplib.py:547
│  │        [60 frames hidden]  ftplib, socket, .., re, sre_compile, ...
│  └─ 0.249 _TaskDecorator.__call__  airflow/decorators/base.py:310
│        [199 frames hidden]  airflow, abc, .., enum, logging, copy...
├─ 0.426 <module>  airflow/decorators/__init__.py:17
│     [1778 frames hidden]  airflow, flask, werkzeug, http, re, s...
├─ 0.411 <module>  epigraphhub/data/brasil/sinan/__init__.py:1
│     [3 frames hidden]  epigraphhub, ..
│        0.407 <module>  pysus/online_data/__init__.py:1
│        └─ 0.236 <module>  pandas/__init__.py:3
│              [584 frames hidden]  pandas, pyarrow, copy, .., textwrap, ...
└─ 0.233 <module>  airflow/__init__.py:18
      [871 frames hidden]  airflow, importlib, sqlalchemy, textw...
github-actions[bot] commented 1 year ago

:tada: This PR is included in version 0.9.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: