Add feed of releases to API

tobias commented 3 weeks ago

We've had a request for a feed of releases.

I think we could do this via an API endpoint. Something like:

GET https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z

The required from param is a timestamp where the feed starts (releases after that timestamp will be returned). The response will be json, and include up to 30 days of releases, and include a link to get the next page/batch:

{
  "next": "https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z",
  "releases": [
    {
      "version": "1.3-SNAPSHOT",
      "group-id": "foobar",
      "artifact-id": "foobar",
      "released-at": "2012-03-06T21:38:31.525Z"
    },
    {
      "version": "0.1",
      "group-id": "org.tcrawley",
      "artifact-id": "swank-clojure",
      "released-at": "2012-03-08T21:38:31.525Z"
    }
  ]
}

The end of the feed would be signaled by an empty releases array, and the from value in the next property will be the released-at of the most recent release (though that can likely be considered just an implementation detail):

{
  "next": "https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z",
  "releases": []
}

Each non-SNAPSHOT version should appear only once in the feed, but SNAPSHOT versions could appear multiple times; ~they will appear for the latest version, but if a release occurs while you are paging the results, the SNAPSHOT will appear again. This is due to how we track versions in the db; SNAPSHOTs have a single entry in the table that is updated on release instead of a new one added (IIRC).~ That is incorrect; we store an entry per SNAPSHOT release, so they will appear in the feed at a position that matches each time it was released.

tobias commented 3 weeks ago

Would the above work for you @cursive-ide? This is I think the bare minimum, so I'm happy to discuss adding more data to the feed.

cursive-ide commented 3 weeks ago

Yes, I think that would work well. I'm a little confused by the pagination - I pass a from parameter, which will then get me releases up to 30 days after that date. But will the next field then return releases after the first 30? So the idea is that I would start from the oldest date and then iterate forward until there are none left?

cursive-ide commented 3 weeks ago

Also, it might be a good idea to have a flag to only include non-SNAPSHOT versions? I'm not sure about this, I'm not sure whether I'd want to index snapshots or not - I'll think about this.

tobias commented 3 weeks ago

@cursive-ide:

I'm a little confused by the pagination - I pass a from parameter, which will then get me releases up to 30 days after that date. But will the next field then return releases after the first 30? So the idea is that I would start from the oldest date and then iterate forward until there are none left?

Yes, correct. You would pass from=date1, and would get 30 days worth of releases. The next url in the response would have from=date2, where date2 would be the earlier of:

date1 plus 30 days
"now"

You could then page until you got an empty array, and the next url is where you could start next time.

However, I realize that that won't account for a 30 day period where there are no releases (I suspect we have gaps like that in the early days), as we will return an empty array for those gaps, which will appear to be the end of the stream. So we need another way to signal "there are no more pages".

An alternate approach is we don't give you 30 days of releases, but instead send up to n releases (100?). Then there will never be an empty page.

So then the from param in the next url would be either:

the released-at value from the last release on the page (if there are release items returned)
the from given in the request (if there are no release items to return)

Also, it might be a good idea to have a flag to only include non-SNAPSHOT versions? I'm not sure about this, I'm not sure whether I'd want to index snapshots or not - I'll think about this.

I'll start w/o this unless you say you need it; it would be simple to add later.

tobias commented 3 weeks ago

For context: if we returned 100 results/page, it would take 3045 pages to iterate through all of the releases throughout history. I think 500 results/page would also be fine from a performance or load perspective, which would mean only 608 pages to get all releases.

clojars / clojars-web

Add feed of releases to API #894