Closed skybristol closed 1 year ago
Hi @skybristol . I enjoyed the meeting too. Thanks for having us out there.
I agree that it makes sense to put an instance on usgs-bcb and start using pull requests. I'll need to explain some things (see below) and may want to clean some stuff up before we do that. More importantly, I'm still not clear on how all this (git) works, so I'll need some hand holding and teaching. My immediate questions are, would you fork it into usgs-bcb? Would you then have to accept all the pull requests and review the code? How do we want to handle branches and version numbers? Whatever needs to be done is fine, I'm just trying to wrap my head around who does what and the relationships between copies and versions. It seems like it will get complicated.
Since the ownership, management, and use of the code is going to change, I should explain the background and it's current state. The package was created by Thomas Laxson many years ago. He shared a copy of it just before leaving GAP. That version was catered to his work environment in Moscow. It included paths to files etc. that wouldn't work anywhere else. I used it (not sure anyone else really did) on the virtual machines we had in Moscow and then Boise. At some point around the time the databases moved back to NCSU, I went through it to make it work in the NCSU environment and proceeded to use a handful of the functions heavily. At that revision, I quickly looked over all the functions and removed very few. I did not fully test all of them, only the ones that I used frequently. This is important to know because it means that you'll want to verify code as you start using it and expect that you'll eventually run into something that doesn't work. That said, I believe that most if not all of the code is solid as Thomas was very meticulous and competent.
After the Software Carpentry workshop in Denver (2016?), I pulled out database passwords etc (they go in a config file) and instituted git versioning.
You brought up foundation boundary data. I'm assuming this was because of those datasets in the "data" folder and some functions. To some degree, that is a holdover from the past, and I agree that we should identify "official" versions and file locations rather than including as they are now. In fact, I will go ahead and delete those datasets as a start. I haven't relied upon functions that use the non-huc boundaries yet and those hucs are "shucs", so I don't think there's an issue there.
If you could let me know the next step to getting an instance on usgs-bcb that would be great start.
Well, given the history and state of things that you describe, perhaps we should think about an incremental reengineering approach and just leave GAPProduction where it is. We could establish a new package or just build additional modules, functions, and classes into the BIS package that is starting to do a number of things for the Biogeographic Information System as a whole. For instance, once we get the GAP range data spun up and build out relationships between SHUCs and ecoregions, states, counties, LCCs, and some of the other standard areas of interest, we can work up a very quick and efficient function similar to gaprange.SppInAOI() with the same parameters but working a slightly different way against the online data. We'll tackle functions in this code as we get aspects of the GAP data suite online and working via APIs and you all can incrementally fit them into your workflows as they make sense.
I think that makes a lot of sense. Just build out the BIS package as you need/can and steal from GAPProduction when you can. It's likely that you'll write functions that make existing GAPProduction functions obsolete, which is fine, maybe ideal.
It may be that GAPProduction just provides you with some ideas about useful functionality or routine tasks you might encounter. Same for the GAPAnalysis function.
Just keep me posted on if you need me to do anything for this.
Hi @nmtarr. Great meeting with you the last couple of days. I've been going through a few things here in the GAPProduction package after seeing your references to it in the WoodyWetland analysis package. It looks like there is a lot here in terms of regularly used functionality for the GAP team, and I'm wondering if it might be appropriate to put a shared instance of the code to the usgs-bcb org and start contributing to that via managed pull requests. As we discussed, that is kind of our online shared lab space where we are building out shared codebases. From there, we'll start moving things to a usgs-bis org once projects meet certain criteria in terms of the move from "offline/backend" data to online data methods. I'm looking at some areas in this codebase where we might start chipping around the edges at functionality that should be core to the Biogeographic Information System like establishing the official foundation for defining states and other boundaries so we are consistent across our analyses and working on how the new live data services (once we get those live) for the 2001 habitat models can be queried directly with those AOIs to return lists of species.
Please feel free to reach out to @aulenbac or me if you want to coordinate putting a copy of this code for collaborative work in the @usgs-bcb org.