Currently, our package audit blobs are formatted in the current pattern:
package/<id>/<version>/<guid>.<operation>.json
Because blob storage can only be queried by prefixes, querying the package audit blob by any metric that is not the package that is being modified is incredibly slow. For example, if one wishes to access the package audit blobs that have been added in the last 15 minutes, they must first list every blob and then sort it themselves, which is horribly inefficient.
Unfortunately this is a required function of Feed2Catalog, both of which must access the deleted packages up to a timestamp.
We should change the way we store package audit logs to a form that we can query more efficiently.
This could be a new format...
<timestamp in ticks>/<id>/<version>/<operation>.json
...or we could move it to another service, such as table storage.
Currently, our package audit blobs are formatted in the current pattern:
Because blob storage can only be queried by prefixes, querying the package audit blob by any metric that is not the package that is being modified is incredibly slow. For example, if one wishes to access the package audit blobs that have been added in the last 15 minutes, they must first list every blob and then sort it themselves, which is horribly inefficient.
Unfortunately this is a required function of
Feed2Catalog
, both of which must access the deleted packages up to a timestamp.We should change the way we store package audit logs to a form that we can query more efficiently.
This could be a new format...
...or we could move it to another service, such as table storage.