chadbaldwin / chadbaldwin.github.io

The repo that drives my blog chadbaldwin.net
http://chadbaldwin.net
31 stars 36 forks source link

Copy a large table between servers, a couple wrong ways, maybe one right way #20

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Copy a large table between servers, a couple wrong ways, maybe one right way | Chad’s Blog

Have you ever needed to copy a giant 50 million record table from one server to another? Because I did…I failed a few times. But eventually I figured it out with the help of the SQL community.

https://chadbaldwin.net/2021/10/19/copy-large-table.html

MattEHarrington commented 3 years ago

This is my first time on the site, but it seems odd that you were forced to jump through so many hoops to get this done, when your dba probably could have gotten this done a lot quicker without using third party tools. But I never heard of dbatools before, so this was an interesting read for sure! (I only have a few years of professional database experience)

chadbaldwin commented 3 years ago

@MattEHarrington it was a bit of an odd scenario. DBA was out on PTO/vacation. I didn't want to wait to get the project done and it was the only thing holding me back. My initial assumption was that it was only going to take me a few minutes to get done in the first place...but it turned into one of those rabbit hole situations where I went from expecting it to take 10 minutes, to actually taking a couple hours. But I'm still happy I went through the process, because now I know how to use bcp as well as Copy-DbaDbTableData which will definitely come in handy in the future.

That said, while dbatools may be a 3rd party tool, I would consider it to be fairly standard in a lot of places. I've even seen it on quite a few job postings for both DBA and developer roles. If you've never heard of it, I very highly recommend getting familiar. It's a great set of tools, especially if you're working with lots of instances, or need to automate certain recurring tasks.

datadill commented 3 years ago

Hey Chad! This is an excellent blog post. Something to consider when using DBATools to transfer data is that even if you kill the process in the PowerShell ISE/IDE, the process can still exist on SQL's side. I was running this process on one of our secondaries to prevent blocking and ended up killing the process thinking it was cleared from SQL, but that was not the case.

Something else I would recommend is looking into import/export within SSMS. It's only a couple of steps although not possible to script out.

Finally, I have never done this before, but creating a BIML package might also be another option assuming all you are doing is moving data from one table to another. Both of these options will most likely be faster than DBATools.

chadbaldwin commented 3 years ago

@janszedc thanks for the feedback! I don't use the PowerShell ISE, but I'll keep that in mind if I ever do. Generally I use VS Code, and monitor it through task manager. If I want to kill it, then I kill it directly from the terminal window in VS Code, or I kill the process using task manager. Then I check the SQL server sessions to ensure there is no hanging session that needs killing due to some sort of hanging locks or transactions.

You're not the first to suggest other methods, such as the SSMS data transfer wizard.

Here was my response to that if you're interested:

https://www.reddit.com/r/SQLServer/comments/qbg4zt/blog_copy_a_large_table_between_servers_a_couple/hhd5tfq/?utm_source=reddit&utm_medium=web2x&context=3

I tried testing it, but it was going to take a projected 89 minutes to complete, which is still much slower than dbatools. Though I will mention that it technically is scriptable since you can produce SSIS packages (dtsx file) from it that can be run on their own which I have used before as well.

I would be fairly surprised to find things that are faster than dbatools, since it's using the .NET SqlBulkCopy class, which is the same base as what bcp is using, which is also incredibly fast. I would guess SSIS is using SqlBulkCopy as well, but I generally don't like the overhead of building an SSIS package.

I'll check out the BIML thing, thanks!

PartnerRe commented 1 year ago

Great post! For flexibility reasons and to distribute the amount of data accross several databases and servers I use PoSh. We have a process which allocates multiple tables (each with a unique run number!) with sometimes millions of records for every run. The data is later used by our analytics people as input for their trickery. The processing database was growing very quickly very large (we knew it), so I had to implement a way to distribute the amount of data but keep a central control where the output of each run is located. DBA's wouldn't be a big help in this scenario. A pragmatic solution was to add a distribution step to the end who moves the tables automatically to it's destination and keeps the inventory updated where everything is. A nice side effect of this setup: The processing happens on a very powerfull SQL server instance (costly), the output will find it's final place on a much cheaper setup. Have to check out Copy-DbaDbTableData though!