Open chuckwondo opened 2 years ago
One approach to consider would be to leverage proxyquire
to "inject" custom logic for the list
method of the "s3"
protocol provider in Cumulus. This could possibly be done by modifying our existing logic that adds the collection name as a prefix to the granule IDs, but rather than doing it after discovery is complete, "inject" the prefixing logic into a custom list
method implementation, or subclass the Cumulus S3ProviderClient
class and override the list
method. This would also require proxyquire
to override the providerClientUtils.buildProviderClient
function to "intercept" use of the "s3" protocol to use our subclass.
For the PSScene3Band collection, setting "duplicateHandling" to "skip" (rather than "replace") to avoid unnecessary ingestion (and related costs), the DiscoverGranules step of the DiscoverAndQueueGranules workflow fails with "granule not found" errors. This is for the same reason as #32. We must somehow prefix the granule IDs with
PSScene3Band-
before discovery checks for duplicates, but this is a harder task than the fix for #32 because Cumulus provides no means to insert custom logic between the "list granules" step and the "check for duplicates" step, so we cannot tweak the granule IDs after they're listed, but before they're checked as duplicates.Acceptance criteria: Configuring "duplicateHandling" as "skip" on the
PSScene3Band
collection does not produce "granule not found" errors during discovery, and properly skips granules that have already been ingested. The logic should also work for other collections, but given that we currently have only thePSScene3Band
collection available, testing against other collections is not required at this point.