BU-Spark / se-Symbiota-portal

The Symbiota Virtual Flora/Fauna project is an open source software project, with central goal of developing on-line tools that aid in the generation, exploration and management of biodiversity data (collection specimens, observations, images, checklist, keys, etc.). See also: http://bdj.pensoft.net/articles.php?id=1114 and http://symbiota.org/
GNU General Public License v2.0
1 stars 3 forks source link

Design a FTP ingestion protocol #71

Open Tian-Tan opened 2 months ago

Tian-Tan commented 2 months ago

There are 2 ways to approach designing a FTP ingestion protocol:

  1. Architect a new ingestion method that takes images directly from a directory and imports them into the system. This approach will take a long time as symbiota's ingestion code is long and distributed among many classes and files. It might be hard to understand the whole structure and build a new ingestion method in a short period of time.
  2. Package the images into a dwc-Archive and use that DWCA file for ingestion. This method is preferred, as the images -> DWCA file pipeline can be built separately from the existing symbiota architecture, removing the need to build along the currently complicated ingestion code. The workflow can look like this:
    • User FTPs a batch of images into a directory
    • A script is run by a command like python3 script.py /directory -o outputFileName
    • The output file is a DWCA, which can be used with symbiota's DWCA import function

While the basic workflow will look like above, some enhancements such as

are also possible.