Closed pablo-gar closed 9 months ago
Why not just remove / deprecate these API? I.e., what convenience do they provide over the Discover API?
If you do elect to retain them, there are other issues to be resolved, e.g., the return type assumes it can return any URI scheme (e.g., s3://...), but with this change it will only return https://.... Probably should simplify the function to return only a URI string
See related notes in #89 - eg, we likely need a schema change as well.
@atolopko-czi did a retroactive analysis on the S3 logs for the Census bucket h5ads
. He showed that, since May 2023:
There have been 959,924 H5AD downloads:
aws s3 sync
callscellxgene_census.download_source_h5ad()
These come from a total of 25 unique IPs, top 13:
remoteip count(1)
134.174.140.55 192446
34.135.174.4 98065
35.238.225.20 91556
150.230.44.69 87392
165.1.79.188 85876
150.230.148.164 82975
173.82.202.31 54713
138.2.237.78 35299
54.151.103.146 7688
138.2.224.226 7662
74.125.77.23 4920
74.125.77.22 4755
74.125.77.21 4308
These data show that users (albeit a small number) likely use these APIs (download_source_h5ad
/get_source_h5ad_uri
) for programatic access to H5ADs. With this information I'm inclined to keep maintaining the APIs.
Obsoleted by #935 - will not fix.
Blocked by #6 and #548. Related to #89
Once
dataset_version_id
is included in the Census data, then the APIsdownload_source_h5ad
/get_source_h5ad_uri
must use ~the Discover API~GET
from CloudFront instead of H5AD copies currently deposited in the Census S3 bucket (see #289)Tasks: