github / gh-gei

Migration CLI for GitHub to GitHub migrations
MIT License
329 stars 90 forks source link

Add support for running a "direct" migration (without blob storage) from GHES #751

Open timrogers opened 1 year ago

timrogers commented 1 year ago

In the GraphQL API, it is possible to run a "direct" migration from GitHub AE or GHES to GitHub Enterprise Cloud. In so-called "direct" mode, GHEC connects to the migration source directly to perform the data export, rather than the CLI performing the expert and then uploading the data to a blob storage provider ("indirect" migration).

Right now, this is not supported in the CLI. If you specify the --ghes-api-url argument to gh gei migrate-repo in order to point to your target instance, then the presence of this argument will cause the CLI to use the "indirect" blob storage flow.

We should add a way for a user to specifically request a direct migration when specifying an API URL. This might also be a good time to rename this attribute to be more generic, rather than GHES-specific. When making this change, we should be careful of backwards compatibiltiy.

dylan-smith commented 1 year ago

With GHES 3.8+ I don't think this feature is even possible to support anymore. Because generation of the migration archive now requires S3/Azure to be configured in GHES itself.

Do you agree @timrogers ? if so we should close this issue IMO.

lukens commented 1 year ago

Surely it's still possible to support for people who want to migrate off GHES < 3.8?

timrogers commented 1 year ago

With GHES 3.8+ I don't think this feature is even possible to support anymore. Because generation of the migration archive now requires S3/Azure to be configured in GHES itself.

Do you agree @timrogers ? if so we should close this issue IMO.

@dylan-smith I agree that this is no longer possible for versions 3.8 onwards - unless we make a further change in a future version, of course!

Surely it's still possible to support for people who want to migrate off GHES < 3.8?

@lukens You're right that it would be possible for earlier versions, for the rare case where the Enterprise Server is accessible by GitHub over the internet.

Given the rarity of internet-accessible GHES instances and the new behaviour in 3.8 onwards, this isn't an issue we are likely to address, so I think it is the right choice to close it and signal that.

@lukens If you want to run this kind of migration, then you can trigger that using the GraphQL API.

lukens commented 1 year ago

Do you know it is a rarity, or is that an assumption? I'd also have though, even if a rarity in general, factors that made it more likely you would want to upgrade to Github Enterprise Cloud would include:

  1. You struggle to keep GHES updated
  2. You don't need to be behind a firewall

So, whilst these may be scarce in terms of total GHES instances, they may be more common amongst those migrating to the cloud.

I am aware that this can be done via the APIs, as that is what I eventually did; I was mainly trying to save others the pain I went through to get to that point (some of that pain has been removed anyway, I believe, by fixing the S3 regions bug).

If not addressing this by implementing this functionality, perhaps it could be addressed instead via documentation improvements (eg. a callout in the documentation saying that if you have publicly accessible GHES edition, your life may be simpler using the APIs directly, plus pointers to the relevant documentation).

I'm also curious how it works in GHES 3.8+. If the S3/Azure settings are specified in GHES itself, I'm assuming you therefore don't need to specify them on the command line as well, and instead the tool would just request an archive link from GHES, and this would return a pre-signed blob store link. Would this not effectively be the same as asking for an archive link and getting a pre-signed direct link back? With what I'm getting at here being: would a solution for this also work as a solution for 3.8+?

Obviously I'm not personally bothered what you do, as I've completed my migration, it just took me a lot longer to do that it would have if I'd been able to do a direct migration with the tool from the start, and so I thought others may benefit from this (on the face of it it also seemed like it'd be a fairly trivial addition, as I was guessing most of what was needed was already there for cloud to cloud migrations - though, that was guess work/assumptions that could well be wrong).

timrogers commented 1 year ago

Do you know it is a rarity, or is that an assumption?

My experience with customers shows that the vast majority of those migrating to the cloud have "locked down" GHES instances.

I can't give a hard percentage as we don't collect telemetry from GHES customers - but from my customer interactions, I'd guess less than 1 in 10.

If not addressing this by implementing this functionality, perhaps it could be addressed instead via documentation improvements (eg. a callout in the documentation saying that if you have publicly accessible GHES edition, your life may be simpler using the APIs directly, plus pointers to the relevant documentation).

I think that's a very fair suggestion, and something I'll pick up. That said, there are known issues with that path - for example it doesn't work with repos larger than 2GB. That's the reason for the changes in 3.8.

I'm also curious how it works in GHES 3.8+. If the S3/Azure settings are specified in GHES itself, I'm assuming you therefore don't need to specify them on the command line as well, and instead the tool would just request an archive link from GHES, and this would return a pre-signed blob store link. Would this not effectively be the same as asking for an archive link and getting a pre-signed direct link back? With what I'm getting at here being: would a solution for this also work as a solution for 3.8+?

You're exactly right about how it works in GHES 3.8 onwards, and you're also right that, for an internet-accessible GHES instance, we could feed that URL directly to GitHub.com.

The bit we would have to figure out from a UX perspective is "how we do know if the URL is accessible, or if instead we need to upload it to blob storage?". That's something I can think about. Of course, we'd also need to prioritise this against changes to help 3.8+ users, who will be the majority before long.

Obviously I'm not personally bothered what you do, as I've completed my migration, it just took me a lot longer to do that it would have if I'd been able to do a direct migration with the tool from the start, and so I thought others may benefit from this (on the face of it it also seemed like it'd be a fairly trivial addition, as I was guessing most of what was needed was already there for cloud to cloud migrations - though, that was guess work/assumptions that could well be wrong).

Thanks for the feedback - I do really appreciate it! It's useful to here from a customer with a different perspective, and to hear that "direct" migration turned out to be the better option for you. Thanks for taking the time to share ❤️