Open maceto opened 11 months ago
I'm not sure if all of that information is available already, or where it lives in that case. But the technology name and category names are available and in use already currently, eg.: https://cdn.httparchive.org/reports/cwvtech/ALL/ALL/jQuery.json
{
"app": "jQuery",
"category": "JavaScript libraries, Miscellaneous, Static site generator"
}
I don't think the description exists somewhere yet, but if the first aim is feature parity, then the main thing we need is name + categories right now.
The similar technologies can probably be based on the category names, if there's no data on it yet?
SELECT
client,
app AS technology,
# TODO
NULL AS description,
# CSV format
category,
# TODO: other technologies within category?
NULL AS similar_technologies,
origins
FROM
`httparchive.core_web_vitals.technologies`
WHERE
date = '2023-07-01' AND
geo = 'ALL' AND
rank = 'ALL'
ORDER BY
origins DESC
@sarahfossheim how should we source the similar_technologies
field, something like "top 3 technologies within same category"?
Also note that the description
field isn't set in BigQuery so we'll leave it null for now.
@rviscomi, should we have any mandatory
param for this endpoint?
I think just technology
cc @sarahfossheim
I think for the first version something like you said can make sense: technologies with at least one category in common, sorted by amount of origins, and then pick the top 3 (or maybe top 5?).
Or maybe an alternative could be:
Then technologies that have many categories in common will come up, even if they're a new or niche technology with not many origins. Which I think makes more sense when it comes to pinning down similar technologies.
If any data gets returned along with the technology names (eg. amount of origins), then we also need to pass in the rank and geo, so that the data of the similar technologies is filtered by the same criteria as the data of the current technology.
Example of how to consume this endpoint
curl --request GET \
--url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/technologies?category=["Blogs", "CMS", "Ecommerce"]&technology=["WordPress", "Chameleon system"]'
@rviscomi @sarahfossheim, all the changes discussed are already deployed.
New URL https://dev-gw-2vzgiib6.uk.gateway.dev/v1/technologies
Documentation: https://github.com/HTTPArchive/tech-report-apis#get-technologies
Updated query to pull in the descriptions:
SELECT
client,
app AS technology,
description,
# CSV format
category,
# TODO: other technologies within category?
NULL AS similar_technologies,
origins
FROM
`httparchive.core_web_vitals.technologies`
JOIN
`httparchive.core_web_vitals.technology_descriptions`
ON
app = technology
WHERE
date = '2023-07-01' AND
geo = 'ALL' AND
rank = 'ALL'
ORDER BY
origins DESC
Hi @rviscomi,
why is there a static date in the WHERE clause of 2023-07-01 for technologies and 2023-08-01 for categories? I think we said this should be the latest month instead?
Yeah it should probably track the latest month.
Is httparchive.core_web_vitals.technology_descriptions
manually or auto generated? If manual, we wouldn't pick up the descriptions for any new technologies, right?
Could you describe the origin/source of this data?
List of similar technologies
Create a script to query this data from BQ transform and save it in Firestore.