After an unrelated investigation into the boto3 package download stats, I noticed that a significant portion of them included files like boto3-1.xx.xx-py3-none-any.whl.metadata. Using the publicly available dataset, I was able to run some queries and found that these metadata files accounted for ~18.55% of our "downloads" (query and results provided below).
Request
Ignore files like *.whl.metadata since including them results in metrics that do not accurately reflect end-user downloads.
SQL Query
#standardSQL
SELECT
COUNT(CASE WHEN file.filename LIKE '%.whl.metadata' THEN 1 END) AS whl_metadata_downloads,
COUNT(CASE WHEN file.filename LIKE '%.whl' THEN 1 END) AS whl_downloads,
COUNT(CASE WHEN file.filename LIKE '%.tar.gz' THEN 1 END) AS source_downloads,
COUNT(*) AS total_downloads,
FROM
`bigquery-public-data.pypi.file_downloads`
WHERE
-- Query information for the boto3 project
file.project = 'boto3'
-- Only query the last 30 days of history
AND DATE(timestamp)
BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND CURRENT_DATE()
-- Only consider downloads using pip
AND details.installer.name = 'pip'
Results
whl_metadata_downloads
whl_downloads
source_downloads
total_downloads
263675949
1157843064
56343
1421575356
263675949 / 1421575356 * 100 = ~18.55%
Additional Information
The *.whl.metadata files were introduced in PEP 658 as a way for package managers to “to inspect distribution metadata without intending to install the distribution”. This was integrated into pip in version 22.3.
Issue
After an unrelated investigation into the boto3 package download stats, I noticed that a significant portion of them included files like
boto3-1.xx.xx-py3-none-any.whl.metadata
. Using the publicly available dataset, I was able to run some queries and found that these metadata files accounted for ~18.55% of our "downloads" (query and results provided below).Request
Ignore files like
*.whl.metadata
since including them results in metrics that do not accurately reflect end-user downloads.SQL Query
Results
263675949 / 1421575356 * 100 = ~18.55%
Additional Information
The
*.whl.metadata
files were introduced in PEP 658 as a way for package managers to “to inspect distribution metadata without intending to install the distribution”. This was integrated into pip in version 22.3.