dina-web-nrm / classifications-docker

Integration project for PlutoF Classifications module - providing a docker application with a set of containers, including tools for loading data
GNU Affero General Public License v3.0
0 stars 0 forks source link

Batch load from .csv #7

Open mskyttner opened 7 years ago

mskyttner commented 7 years ago

Looking for a way to load a batch of data in .csv as a transaction and being able to roll all of it back if it fails.

Thinking about this use case: "When loading Dyntaxa, if something breaks on row number x of the input data file, how can previous rows be rolled back ensuring no state anywhere was compromised?"

Therefore looking for batch transactions exposed preferably through the API.

An idea for simplistic proof-of-concept implementation would be to provide an API endpoint that allows upload of an .csv initially using the current format required by the PlutoF-taxonomy module (later also ideally for DwC formatted data - see other issue #2).

Perhaps the upload then could happen as one database transaction in postgres rather than several row-by-row operations, following the style of a transaction looking in spirit similar to this expressed in sql:

BEGIN; COPY checklist_tmp FROM '/path/to/csv/dyntaxa.csv' DELIMITER ',' CSV;); (do further processing here); COMMIT;

This could perhaps give a considerable performance compared to the current row-by-row loading approach?

mskyttner commented 7 years ago

Related to #1 (but differently expressed).