PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
203 stars 235 forks source link

Make `data.atmosphere` (more) self-contained #2503

Open ashiklom opened 4 years ago

ashiklom commented 4 years ago

I propose that PEcAn.data.atmosphere be a general tool for downloading and processing meteorological data (including gapfilling, formatting to PEcAn standard, etc.), but not provide functions that are specific to the PEcAn workflow. if we can make data.atmosphere be self-contained, that will make it easier to install and test and make people more inclined to use it outside of the standard PEcAn workflow (which is its own form of testing).

The current PEcAn dependencies in data.atmosphere are PEcAn.DB and PEcAn.remote (and PEcAn.logger, but it's tiny and rarely changes, so I'm not including it). PEcAn.DB is primarily used in met.process and associated helpers. I propose we move met.process and friends to PEcAn.workflow, which should eliminate the dependency on PEcAn.DB and the need for a database.

PEcAn.remote is only used for its fqdn() for figuring out the current machine's hostname. I propose we either: (1) Create an internal copy of fqdn() that lives in data.atmosphere and is not exported (it's a small function unlikely to change in the future, so I think this is well worth eliminating a bulky, non-CRAN dependency); or (2) set the host information for remote downloads outside of the download.* functions (for example, make it an optional argument of these functions, or just modify the output data.frame outside of the download.* functions).

Thoughts?

mdietze commented 4 years ago

Sounds like a good plan to me.

crollinson commented 4 years ago

I fully support this and can try to provide support/help if needed. FWIW, I use most of the data.atmosphere functions outside of PEcAn already (resulting in me commenting out the functions you mention), so it'll be easier for me to keep PEcAn mainline functions up to date with my debugging if this happens.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.

moki1202 commented 3 years ago

@ashiklom can I work on this issue?

ashiklom commented 3 years ago

@moki1202 Since data.atmosphere is very tightly coupled to the overall PEcAn workflow, I would strongly suggest that you set yourself up to run the workflow with your version of the code first. Then, as you're working on this issue, you can run local workflows and make sure everything still works before submitting the pull request.

moki1202 commented 3 years ago

@ashiklom to remove dependancy on PEcAn.remote::fqdn() , I can just move the base\remote\R\fqdn file to modules\data.atmosphere and make appropriate adjustments in the description file right?

ashiklom commented 3 years ago

I would copy, not move, so that other code is not affected.

moki1202 commented 3 years ago

@ashiklom just noticed that this issue won't need much work once the packages are uploaded to CRAN right?

ashiklom commented 3 years ago

Maybe. But "once the packages are uploaded to CRAN" refers to what's likely a very long time in the future; possibly, never, depending on how that task goes. So I wouldn't necessarily count on it.

In my opinion, the opposite sequencing makes more sense --- that is, make this package (and other PEcAn modules) more mutually independent, then upload them one by one to CRAN.