AllenNeuralDynamics / aind-data-asset-indexer

MIT License

0 stars 0 forks source link

Check if s3 prefix format matches expected regex #63

Closed helen-m-lin closed 3 months ago

helen-m-lin commented 4 months ago

User story

As a user, I want to see only metadata records from valid s3 prefixes according to a certain format, so that invalid folders are ignored.

Note that in the lambda function, invalid s3 prefixes are already ignored.

Acceptance criteria

A valid s3 prefix should be in format: {modality}_{id}_{acq_datetime}

[x] Given that the populate job is run, a s3 prefix with an invalid format should not be processed.
[x] Given that the bucket indexer job is run, a s3 prefix with an invalid format should not be processed.

Sprint Ready Checklist

[x] 1. Acceptance criteria defined
[x] 2. Team understands acceptance criteria
[x] 3. Team has defined solution / steps to satisfy acceptance criteria
[x] 4. Acceptance criteria is verifiable / testable
[x] 5. External / 3rd Party dependencies identified
[x] 6. Ticket is prioritized and sized

Notes

We can check using DATA = f"^(?P<label>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$" from aind-data-schema

helen-m-lin commented 3 months ago

~250 invalid prefixes in s3 (mostly in the aind-ophys-data) are filtered by the indexer (would not show up in docdb)