TUW-GEO / geopathfinder

Querying and searching data on the file system
MIT License
0 stars 2 forks source link

Enhance geopathfinder #29

Open cnavacch opened 2 years ago

cnavacch commented 2 years ago

Currently, I identify several points for improving the class logic and making the package more "pythonic".

  1. Folder naming conventions/classes and file naming conventions/classes are completely decoupled. A framework uniting both classes would make sense.
  2. Add magic methods like __str__ or __add__ (e.g. adding a path to a tree) and properties like n_paths, n_files, or disk_usage to better interact with an object. Especially, replace functions doing printing, e.g. print_file_register and replace them with sth. like this https://pypi.org/project/seedir/
  3. I would prefer to have stacked function calls, which always return self, e.g. tree.filter(level, pattern='..').filter(level, pattern='..').prune(level) and not having all these "collect" functions.
  4. Temporary creation of data frames should be prevented. It would be better to have a central data frame dealing with folders and files (see get_disk_usage or search_files_ts)
  5. Building a tree is quite slow at the moment, because it uses os.walk and does not utilise parallelisation.
  6. Refactor build_smarttree in general - a lot of list appends happen there, even after one knows the "dimensions" of paths and folders.
  7. Regex patterns should be used as a general entry point for filtering folders or file names, not starting from a tuple of strings.
  8. Its currently quite difficult to understand how to use geopathfinder in detail. More docs and Jupyter Notebooks should be added.

This should just be the central issue collecting and discussing improvements or new ideas, which then can be distributed to other issues later on. Please feel free to add your ideas and thoughts - this should be considered as a brainstorming. If we come up with a specific set of tasks, we could also ask a student or a new employee to implement them.

And by the way: I did not find a package, which does already similar things - so this might be a huge benefit for the community!

raphaelquast commented 2 years ago

hey, just a quick comment concerning libraries that do similar things... I didn't check it in detail but this one might have some overlap (at least under the hood): https://github.com/sertit/eoreader

cnavacch commented 2 years ago

Thanks for pointing to this @raphaelquast ! I have never heard of this library before. I quickly browsed through it, but it seems to be very sensor and EO specific and tailored towards reading these data rather than dealing with the file system and folder trees. But its definitely a good idea to keep looking for other projects doing similar things.

sebhahn commented 2 years ago

I started a file handling module in ascat (https://github.com/TUW-GEO/ascat/blob/master/src/ascat/file_handling.py)

using preinstalled UNIX programs (e.g. find, locate) might be option or tools like: https://github.com/junegunn/fzf https://github.com/clvv/fasd