Generate taxonomic checklists and occurrence collections from biodiversity collections like GBIF, iDigBio. Converts DwCA tracked by Preston into parquet and sequence files to enable parallel processing in a compute cluster.
This library relies on an apache spark and Mesos/HDFS clusters to:
At time of writing (June 2017), this library is used by http://effechecka.org and https://gimmefreshdata.github.io . Note that effechecka and freshdata projects are not longer active.
This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.