Open richaagarwal opened 2 years ago
This work may end up addressing the bug reported in #611 as well, as we have a hypothesis that the lag reported there is due to PyPI's non-versioned endpoint taking a while to catch up to the latest release's data.
note: in order to support #712, #702, and #703, we'll need to ingest ALL release dates from PyPI
This is currently blocked as the simple API may not be an option, as it doesn't support upload time at the moment (see https://github.com/jwodder/pypi-simple/issues/5).
This is a follow-up to https://github.com/chanzuckerberg/napari-hub/issues/598, in which we implemented a hotfix for a breaking change introduced in the pypi API we query, which removed the
releases
key from the versioned API endpoint we were using. More information here: https://github.com/pypi/warehouse/pull/11775. For the hotfix, we changed the query to hit the non-versioned API endpoint instead, with the knowledge that the field is considered deprecated there as well, though it has yet to be removed.As this comment notes, there's no current timeline to remove
releases
from the non-versioned API endpoint, but it would be prudent to start thinking about a transition plan. In #598 I outlined these two options:Option 1: Re-introduce fields by using the recommended simple API instead. This in turn could be broken out into two parts reintroducing just those two fields, and then later possibly re-working all of format_plugin to rely on the simple API. (Ideally these would both be done at once, but depending on how important it is to get back to populating these fields, we could delay the latter work). Option 2: If accessing
upload_time_iso_8601
from the urls array is a reliable source for therelease_date
(which it appears to be), we may not need to switch APIs at all, and instead could re-work how we handlefirst_released
. Ideally, we would only populatefirst_released
the first time we grab data for a plugin, in which case it would be the same asrelease_date
with no need to ever get previous version releases in any given request.It turns out that option 2 is not very straightforward given our current S3 architecture for storing data, so I'd recommend that we revisit this work when we are ready to prioritize moving to a database.
@neuromusic let's connect on this when you're back!