AllenNeuralDynamics / aind-data-access-api

Library to interface with AIND databases
MIT License
2 stars 0 forks source link

default `paginate_batch_size=1000` on `MetadataDbClient.retrieve_docdb_records` returns 500 error on first page #82

Closed dyf closed 2 months ago

dyf commented 3 months ago
results = self.doc_db_client.retrieve_docdb_records(
    filter_query={"name": {"$regex": "^SmartSPIM_.*stitched_.*"}},
)
len(results) # 483, with logging errors
results = self.doc_db_client.retrieve_docdb_records(
    filter_query={"name": {"$regex": "^SmartSPIM_.*stitched_.*"}},
    paginate_batch_size=500
)
len(results) # 1483, no errors
helen-m-lin commented 3 months ago

After investigation, we found that this is due to smart spim records being arbitrarily large and exceeding the 6MB payload limit in Lambda, even when compressed. We can add a check for the response size and improve the error handling (detailed error and suggest user to use a projection or decrease page size).

dyf commented 3 months ago

@helen-m-lin agree with this. Let's also change the default page size to 500.