apache / pekko

Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala
https://pekko.apache.org/
Apache License 2.0
1.17k stars 140 forks source link

Use sbt-rsync plugin instead of github action or remove sbt-publish-rsync #104

Open mdedetrich opened 1 year ago

mdedetrich commented 1 year ago

While looking at https://github.com/apache/incubator-pekko/issues/103 I noticed that the pekko project already happens to have infrastructure setup to publish nightlies (aka snapshots) to an rsync directory in the exact same manner that was implemented in https://github.com/apache/incubator-pekko/pull/60.

This is done by using https://github.com/lightbend/sbt-publish-rsync and you can see the details at https://github.com/apache/incubator-pekko/blob/main/project/Publish.scala.

We should either investigate using the sbt-publish-rsync plugin or remove it entirely from pekko since https://github.com/apache/incubator-pekko/blob/main/.github/actions/sync-nightlies/action.yml exists. My own personal preference would be to use sbt-publish-rsync since its far simpler (due to it being done by sbt the command to publish would be quite trivial especially when combined with publishing to Maven snapshots directory, see https://github.com/apache/incubator-pekko/issues/103) and I like reducing the direct dependance on shell scripts. From a cursory glance of looking at https://github.com/lightbend/sbt-publish-rsync/blob/master/src/main/scala/com/lightbend/sbt/publishrsync/PublishRsyncPlugin.scala#L45-L49 it shouldn't be problematic however there is a risk that changes might need to be done to sbt-publish-rsync plugin for Apache's use, if needed this does present an opportunity to create an sbt-publish-rsync-apache plugin.

@seglo @pjfanning Thoughts?

seglo commented 1 year ago

The Akka team used rsync to publish documentation assets to a Lightbend maintained server (gustav). Java snapshots were published to maven repositories, originally a bintray repo, and then the actual sonatype snapshots repo when bintray went away. I briefly considered repurposing the Publish and sbt-publish-rsync infra, but when I found Apache Arrow's nightlies implementation with a GitHub Action I opted to use it because the precedent was already established by another Apache project and I liked that it was setup to automatically truncate the number of snapshots kept on nightlies.

One of the issues with Akka's current snapshot publishing is that snapshots are available indefinitely until they're deleted manually or until the sonatype infra team decides to do some cleanup, Arrow's solution to keep the last 30 versions seemed like a nice solution. To achieve the same thing using sbt-publish-rsync would require some customization either in the form of a shell script sbt commands (i.e. steps in an sbt-release setup), or customization of sbt-publish-rsync. I think that given publishing snapshots to nightlies is realistically only going to be done by the Pekko project there's not a lot of value in making the process portable.

I think we should drop the existing rsync functionality found in the project infra since it no longer serves a purpose.

mdedetrich commented 1 year ago

Thanks, after thinking about this I agree with your conclusion to just remove sbt-publish-rsync. There may be a use for it later down the track (if I can figure out the 30 day expiry) but for now its far simpler to just leave your solution.

There is also the fact that implementing this in sbt also requires a workaround due to the fact that sbt only currently allows to publish to one repo, so you would have to do something like https://stackoverflow.com/a/21032721

pjfanning commented 1 year ago

I like the current pekko nightly solution. One minor quibble is that the version numbers for Scala 2.12 jars differ from the 2.13 jars. Likewise, for Scala 3 jars.

mdedetrich commented 1 year ago

I like the current pekko nightly solution. One minor quibble is that the version numbers for Scala 2.12 jars differ from the 2.13 jars. Likewise, for Scala 3 jars.

Are you talking about the artifact prefixes? If so this is intentional, its how Scala deals with binary compatibility

pjfanning commented 1 year ago

The most recent Scala 2.13 jars are 0.0.0+26529-29e5cf4b+20230116-1205-SNAPSHOT https://nightlies.apache.org/pekko/snapshots/org/apache/pekko/pekko-actor_2.13/

The most recent Scala 2.12 jars are 0.0.0+26529-29e5cf4b-SNAPSHOT https://nightlies.apache.org/pekko/snapshots/org/apache/pekko/pekko-actor_2.12/

The most recent Scala 3 jars are 0.0.0+26529-29e5cf4b+20230116-1213-SNAPSHOT https://nightlies.apache.org/pekko/snapshots/org/apache/pekko/pekko-actor_3/

mdedetrich commented 1 year ago

Ah I see what you mean now, its the version numbers being off (i.e. different) for the same single run of publish. We should probably make an issue for this because its going to create confusion for end users.

seglo commented 1 year ago

The only difference is the timestamp component of the version. I think it's because the publish command is a cross-build task +publishM2 and I guess the timestamp is derived for each version of Scala. We could customize the version format to drop that component if we want them to be consistent, but I think the timestamp is useful when you're experimenting locally and you don't have local commits yet to differentiate versions.

mdedetrich commented 1 year ago

The principled solution would be the ability to parameterize the timestamp as an input in the publish command so you only generate the timestamp once (i.e. when just before publish is executed) and then feed it into the publish command so the timestamp is consistent across versions.

I don't know if this is possible in sbt especially with the usage of +<command> for cross publishing.

but I think the timestamp is useful when you're experimenting locally and you don't have local commits yet to differentiate versions.

There is truth to this but I would assume that if you are publishing locally you would use publishLocal and at least personally for me I don't care about the timestamp as I am only interested in using the latest version as its an iterative developer flow with quick turnaround

seglo commented 1 year ago

I found a stack overflow answer that has a recipe for pushing the timestamp to the filesystem and reading it across builds. https://stackoverflow.com/questions/21053785/setting-unique-snapshot-version-when-cross-building-in-sbt

mdedetrich commented 1 year ago

I found a stack overflow answer that has a recipe for pushing the timestamp to the filesystem and reading it across builds. https://stackoverflow.com/questions/21053785/setting-unique-snapshot-version-when-cross-building-in-sbt

Not the nicest solution but it works and is a nice improvement, ill make an issue for it

seglo commented 1 year ago

There is truth to this but I would assume that if you are publishing locally you would use publishLocal and at least personally for me I don't care about the timestamp as I am only interested in using the latest version as its an iterative developer flow with quick turnaround

Yes, it's extremely anecdotal but it is something I've done before!

I think in general the timestamp is nice to have for reference sake, at least the date component is.

jrudolph commented 1 year ago

Usually, the timestamp is only added when the workspace is not clean, so we need to figure out why that's the case. Otherwise, using the git describe output (as usual) should give enough details to order snapshot releases and associate them with the corresponding commit.

mdedetrich commented 1 year ago

@jrudolph I think this is the wrong place for your comment, I think this discussion is of relevance https://github.com/apache/incubator-pekko/pull/105#discussion_r1071912938

pjfanning commented 1 year ago

are we in a position to close this?

mdedetrich commented 1 year ago

So I think there is merit in the premise of this ticket but its definitely not required for our first release so removing it from the milestone.

pjfanning commented 1 year ago

I think the rsync that we have in CI is fine - so we need this @mdedetrich ?

mdedetrich commented 1 year ago

Its an improvement I would like to do because although the CI solution works its a very basic/primitive one, i.e. it needs to have knowledge of Scala internals so if we for example change scala versions we can break publishing, see https://github.com/apache/incubator-pekko/blob/main/.github/workflows/publish-nightly-docs.yml#L63

The most ideal solution is still an SBT plugin because its easier to maintain/automatic/much less boilerplate since we can use it in a common sbt-pekko plugin but its def not a priority

mdedetrich commented 1 year ago

I think the rsync that we have in CI is fine - so we need this @mdedetrich ?

Just want to add that with the recent work happening on doc publishing due to release, I would argue that its further increasing the the legitimacy of such a plugin.

If you have a look at our standard publish action right now (i.e. https://github.com/apache/incubator-pekko/blob/main/.github/workflows/publish-1.0-docs.yml#L61-L95) you can see how unwieldy its starting to become. The main reason behind this is binary/project/scala versions is all a black box to github actions and so questions like "what is the last patch version of pekko" and "only maintain patch version docs for the latest x releases" are much easier to automate via an sbt-plugin.

And remember that the current solution (which is a workaround with hardcoded constants) is only for a single repo, we have like 10-12