Integrate option to deduplicate data before tiling

Currently, deduplication in the visualization workflow starts after the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, and/or 3D tiling), then the duplicate rows are flagged with a boolean attribute, then the polygons that are True for that attribute are removed at the specified step.

For some datasets, deduplicating the data before it is tiled could be beneficial. For example, Ingmar Nitze's Arctic lake change dataset is composed of UTM zones that overlap at the edges, and he prefers to have the data deduplicated before it is input into the viz-workflow. That way, whether users are interested in the viz output (tilesets of lakes) or the input data, they can have access to only the deduplicated data.

This functionality is in the exploratory phase. An example of applying of the neighbor deduplication approach to non-tiled data can be found in this issue. One way this functionality could be integrated into the viz-staging package is by adding more acceptable inputs for the deduplication options in the config. An example: deduplicate_at could accept a new option like "before_tiling". In addition to new flexibility in the config, certain pre-deduplication steps would need to happen such as adding a source_file attribute to the input data.

PermafrostDiscoveryGateway / viz-staging

Integrate option to deduplicate data before tiling #55