AllenNeuralDynamics / aind-data-asset-indexer

MIT License
0 stars 0 forks source link

Indexer should include processed Code Ocean results #38

Closed dyf closed 3 weeks ago

dyf commented 3 months ago

The indexer needs to include assets that are processed results in the Code Ocean datasets bucket. This is needed so that science teams can analyze data as soon as it is processed, regardless of whether we are capturing it to an external bucket.

Acceptance Criteria

jtyoung84 commented 2 months ago

Only these buckets:

Dev account bucket: codeocean-s3datasetsbucket-eg0euwi4ez6z Prod account bucket: codeocean-s3datasetsbucket-1u41qdg42ur9

Have a job that runs on a schedule (lets start with every two hours) that scans the bucket, pulls information from the codeocean index using aind-codeocean-api and the service-account-token, builds a metadata record, and pushes that record to the DocDb index. It should check the DocDb index first to filter out stuff that has already been processed.

mekhlakapoor commented 1 month ago

We can do this in a separate ECS container if its easy