AlertaDengue / PySUS

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
GNU General Public License v3.0
175 stars 68 forks source link

[Performance] Fix TODO on recursive parent Directory initialization #182

Open luabida opened 10 months ago

luabida commented 10 months ago

The time delay a database takes to be called is due to the recursive parent initialization of a Directory. It happens because ir performs a FPT CWD command for each parent dir until root (/), which is totally unnecessary, because once the cwd doesn't throw a NotADirectory error, it is certain that every parent dir exists in the FTP server. It should load every parent on CACHE with loaded being False, skipping the command.

fccoelho commented 10 months ago

Can we avoid doing this?

luabida commented 10 months ago

Can we avoid doing this?

I think an approach could be rather than using the recursive method, parse all parent dirs individually and forcibly link them using Directory.parent, more or less "hard coded", and instantiate them with loaded as False, but having their child dir in its content (kinda similar to a linked tree structure). It will require a special case to load() tho

alandrebraga commented 8 months ago

Hello! Reading the code base i think that the solution is use the parent directory if already exists

    try:
        directory = CACHE[parent_path]  # Use parent directory if already exists
    except KeyError:
        # Parent directory does not exist, create it and set loaded to False
        directory = object.__new__(cls)
        directory.parent = Directory(parent_path)  # Recursive
        directory.name = name
        directory.loaded = False
        directory.__content__ = {}
        CACHE[parent_path] = directory

Is there a way to reproduce and test this solution? I'm new to this project

luabida commented 8 months ago

Hello @alandrebraga! If I understood correctly, you are trying to test a local change in the code, right? The way I do the local testing is to open the file using IPython (pip install ipython) via terminal. So the command in your case would be: ipython -i pysus/ftp/__init__.py (assuming you are running from the project's root directory). After executing the command, you can use the classes directly without importing them. Please let me know if you have any questions

image