we could limit the describe fields returned to what is required which should reduce the size of the response to lower bandwidth load and make querying faster, the only things we should need from the describe details are folder, name and archivalState
when searching DNAnexus for all BAMs and describe details in a project here: https://github.com/eastgenomics/Genetics_Ark/blob/7a53b8fa0d374234170ede239e37269c23946f88/cron/find_dx_data.py#L124
we could limit the describe fields returned to what is required which should reduce the size of the response to lower bandwidth load and make querying faster, the only things we should need from the describe details are
folder
,name
andarchivalState
Example of current response for single object:
Limiting this to required fields:
Checking size differences of responses for a random 002 project of BAM files:
This gives ~60% reduction in the size of the response, and the same can be done for the query to find index files too.
In addition, this call to find the CNV index happens for every CNV BAM, where we could search for both first for the whole project, then just match them up by name/path to only make 2 API calls for the CNV BAMs per project: https://github.com/eastgenomics/Genetics_Ark/blob/7a53b8fa0d374234170ede239e37269c23946f88/cron/find_dx_data.py#L303-L310