EndPointCorp / end-point-blog

End Point Dev blog
https://www.endpointdev.com/blog/
17 stars 65 forks source link

Comments for Editing large files in place #237

Open phinjensen opened 6 years ago

phinjensen commented 6 years ago

Comments for https://www.endpointdev.com/blog/2009/12/editing-large-files-in-place/ By Greg Sabino Mullane

To enter a comment:

  1. Log in to GitHub
  2. Leave a comment on this issue.
phinjensen commented 6 years ago
original author: Adrian
date: 2009-12-15T22:04:02-05:00

Nice work, and nice write-up! Thanks for sharing!

phinjensen commented 6 years ago
original author: Moltonel
date: 2009-12-16T05:25:42-05:00

This trick can come in handy, but gets painfull when you need to alter the number of characters.

What about piping sed output to psql, instead of asking sed to write a file ? I've used this technique on pg dumps before, out of lazyness when I could easily have edited the file. It feels natural enough.

Dont know about stoping after N replacements with sed. Sure it's possible, but dont want to dig in when you can just specify line numbers, or just ignore the negligible overhead of sed'ing the entire file (we'll be waiting for postgres anyway, not sed).

Oh, and apropos emacs, what kind of DBA is still using a 32bit OS these days ? :p

phinjensen commented 6 years ago
original author: Platonides
date: 2009-12-16T09:07:30-05:00

A quite risky operation. I would have probably taken out ~1 sector/page, edited it and then replaced the original one.

Not the main for this post, since you already had the file loaded, but instead of manually copying the file size to the seek parameter, you can do dd seek=stat -c %s data.20091215.pg if=/dev/zero of=data.20091215.pg bs=1024 count=99999

phinjensen commented 6 years ago
original author: Greg Sabino Mullane
date: 2009-12-16T09:31:04-05:00

Moltonel:

True, about a piped-to-psql sed not being expensive compared to the other bits, but my worry there would be about replacing something it shouldn't. It's easy enough to know that 'template0' is a unique string in the first few lines, but what if it appears buried in the data later on in the 50GB+ file? Also, in this case there were similar lines immediately after the three in question that I did not want to be replaced. If I did have to alter the number of characters, I probably would write a quick perl script to pipe it through. All the power of sed, plus I can tell it to stop processing after a certain point and just turn into a dumb pipe (e.g. (print and next) if $. > 300)

As far as emacs, that's seldom in my control on client boxes, and not all distros have a 64-bit compiled emacs available. (While even the 64-bit version has a limit (unlike vi), at that point I'm not likely to edit directly anyway, but use something like dd or split. :) But there are plenty of times when I've wanted to edit a few hundred meg file and emacs failed me so I had to use vi.

phinjensen commented 6 years ago
original author: Greg Sabino Mullane
date: 2009-12-16T09:34:37-05:00

Platonides:

Yes, quite risky, but also quite efficient, and this work was done under a lot of pressure to get things done quick. In my defense, I did test the process out first by doing a head -10000 largefile > foobar, and then modifying foobar. Thanks for the stat -c trick!

phinjensen commented 6 years ago
original author: Moltonel
date: 2009-12-16T11:02:25-05:00

Here, I looked it up and it is actually very simple :

sed 20,30s/template0/template1/ will do the work only for lines 20 to 30.

sed '15s/template0/template1/;17s/template0/template1/' will do the work specifically for line 15 and 17.

phinjensen commented 6 years ago
original author: Ezekiel
date: 2009-12-16T12:46:16-05:00

Really awesome tricks, thanks Greg!

Add the "-i" or "--in-place" flag to sed to make it modify in-place.

Using dd to copy a piece of the file to another machine over ssh for editing ("vim -R" to open in "read-only" mode and refrain from creating a swap file on-disk. You can still write changes when finished.) might also be useful under stress; then use dd again to put the chunk right back over the file.

Certainly enjoyed the post!

phinjensen commented 6 years ago
original author: Jon Jensen
date: 2009-12-16T13:16:26-05:00

Ezekiel, I just did a quick check out of curiosity and found that sed -i does "in-place" edits the same way Perl does: by writing a new file and then moving it in place of the original. Worth noting in case anyone reading is misled by the term "in-place".

Is there anything more automated than Greg's method that does true in-place edits of the blocks of an existing inode where they are? There are nice hex editors etc., but anything commonly distributed on e.g. Linux systems?

phinjensen commented 6 years ago
original author: Johan Chang
date: 2009-12-16T20:51:55-05:00

Why not just use tmpfs?

phinjensen commented 6 years ago
original author: Greg Sabino Mullane
date: 2009-12-16T22:11:09-05:00

Ezekiel: nice tip about editing just part of the file, then plopping it back in. I'd probably make a two copies and diff them before dd-ing back in place.

Moltonel: thanks, I suspected sed had some simple solution like that.

Johan Chang: No particular reason to use ramdisk over tmpfs. I'll use tmpfs in my next demo to keep things balanced.

phinjensen commented 6 years ago
original author: Anonymous
date: 2009-12-29T07:31:02-05:00

as Moltonel pointed out, sed can do changes based on line numbers.

but please note that in

sed '15s/template0/template1/;17s/template0/template1/'

separating sed commands with ';' is a non-posix extension and might not work with all the sed versions out there.

phinjensen commented 6 years ago
original author: Anonymous
date: 2010-01-25T10:30:41-05:00

Rather interesting place you've got here. Thanks for it. I like such topics and everything connected to them. I definitely want to read more soon.