brunobrr / bdc

Check out the vignettes with detailed documentation on each module of the bdc package
https://brunobrr.github.io/bdc
GNU General Public License v3.0
23 stars 7 forks source link

additional functionality for MOL internal uses #236

Open matthewsrogan opened 1 year ago

matthewsrogan commented 1 year ago

Hi Bruno, I am a data curator with the Map of Life and among my responsibilities is helping develop and apply cleaning workflows for global occurrence datasets. The bdc package mirrors an internal workflow we were developing, and so we are exploring using the bdc package as the core of our data prep pipeline for MOL and other projects within Yale’s Center for Biodiversity and Global Change. However, there are a few integral steps that we need that do not currently appear to be available within bdc. We are happy developing this extra functionality internally, but if you feel these steps have broader value, we would be happy to work with you to integrate them directly into the package.

I have written a few examples of functions that I wrote to work within the bdc pipeline. mol_roi.R is a spatial filtering step that lets us flag data using spatial layers such as our internal land layer derived from the Global Shoreline Vector. It’s basically a wrapper function with helpers for different formats of the spatial region of interest. mol_spatiotemporal_duplicate.R is a cleaning step we use to identify duplicate records prior to spatiotemporal filtering. It’s especially important when we integrate overlapping datasets, such as GBIF and California Consortium of Herbaria. mol_flagged_by_source.R is a function to flag records based on quality control attributes provided by the source (e.g. GBIF’s ‘issue’ attribute).

Other functions we need for our workflows include one to flag vagrant occurrences and geographic outliers based on the distance from the species’ range, and another to identify non-native occurrences such as those plants in gardens, animals in zoos, or domesticated species/variants.

Either way, we will continue to develop this functionality for our internal uses. But if you feel any of these steps are of value to the package’s user base, we are eager to work with you to make sure they work seamlessly with the package and that they pass all testing requirements. From our perspective, the more centralized everything is within bdc, the better.

The package is intuitive and meets our needs well, so we appreciate what you and your collaborators have done to develop it.

Cheers, Matt Rogan

brunobrr commented 1 year ago

Hi Matt,

My apologies for the delay in getting back to you. Thank you very much. I'm sure these functions will be of great value to bdc's users.

bdc team will check the functions carefully, but I believe that they be incorporated into bdc seamlessly.

It will be our pleasure to invite you to be a collaborator of bdc here on GitHub and on future publications.

Cheers, Bruno