PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
199 stars 230 forks source link

Feature: Automate Tracking of PEcAn Package Dependencies Across the Project #3286

Closed Sweetdevil144 closed 2 months ago

Sweetdevil144 commented 2 months ago

Description

Is your feature request related to a problem? Please describe. Identifying which PEcAn sub-packages are used across our project is currently manual, error-prone, and inefficient. This issue becomes significant especially given the goals of the GSoC project "Optimize PEcAn for freestanding use of single packages."

Proposed Solution

Describe the solution you'd like I've developed a script that automatically scans R scripts in the PEcAn project, identifies usage of PEcAn sub-packages, and outputs a CSV file listing dependencies. This solution simplifies tracking dependencies, aiding in optimization and modularization efforts.

Alternatives Considered

Describe alternatives you've considered

  1. Manual Tracking: Inefficient and unsustainable with project scale.
  2. Static Code Analysis Tools: Less customized, not directly tailored to PEcAn's needs.

Additional Context

mdietze commented 2 months ago

What you're suggesting already exists: https://github.com/PecanProject/pecan/blob/develop/docker/depends/pecan_package_dependencies.csv

Sweetdevil144 commented 2 months ago

What you're suggesting already exists:

Thanks. I've been aware of the scripts/generate_dependencies.R file responsible for generating this script. But what I proposed was addition of a script that lists all PEcAN Packages and respective functions utilised internally by other PEcAn Packages. Although, now I realise that this would just be a subset of the original generate_dependencies.R script. Thanks for Correction. Also, below are links to my .R script and generated .csv file for a review:

https://github.com/Sweetdevil144/module-dependencies/blob/main/pecan_dependencies.csv

https://github.com/Sweetdevil144/module-dependencies/blob/main/find_package_utilizations.R

Sweetdevil144 commented 2 months ago

Another point that I wanted to add was that my custom pecan_dependencies.csv also provides details on What functions are Utilised from our Imported Packages making it easy for us to determine our Process of Optimiation of Packages. Although a lot more Optimization in my .csv may be needed (for example : removal of common imports like PEcAn.logger which are being utilised for logging. Another removal may be related to PEcAn.db)

infotroph commented 2 months ago

Being able to see which functions are called from which package does sound like a useful feature, though I have to say I’m much more often looking for all the functions a package calls from one particular dependency than I am in all functions from all its dependencies. If this can support that use case while providing an improvement in ergonomics over my default grep pkgname -R dirname, it could become a tool I reached for regularly.

A few other limitations I see in the current implementation:

Overall I doubt I’d use it in its current form, but if it helps you don’t let me stop you from using it! If you want to spent more time on it as a learning tool, I recommend thinking through how it could find all the functions from one arbitrary package.

infotroph commented 2 months ago

A higher-level comment: Knowing what functions we use from where is a great strategy for debugging and for planning refactoring, but I’m less sure it’s necessary to automate it the way this issue proposes. The times I’d use this script would be manual invocations while aaking a focused question like “Ugh, [dependency] is causing installation problems, which functionality in [package] do we import it for? What would break if I remove it?” That’s usually easier to answer by searching for [dependency] on the fly than by looking it up in a big list of all the called functions.