chmln / sd

Intuitive find & replace CLI (sed alternative)
MIT License
5.72k stars 136 forks source link

feat: allow streaming, line-buffered input and output #287

Open corneliusroemer opened 8 months ago

corneliusroemer commented 8 months ago

The major shortcoming of sd right now is that it doesn't support streaming stdin to stdout, instead, all input is read into memory which means sd can't be used for bigger-than-memory tasks.

This PR is based on @vtronko's work in https://github.com/chmln/sd/issues/100 but adapted for current main.

It's a bit of a graft and rough around the edges, no validation of multi-line mode switch off etc, but it's a proof of concept and already useful (roughly 3x faster than sed).

Resolves:

nc7s commented 8 months ago

2 cents:

CosmicHorrorDev commented 8 months ago
  • I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

  • I "unified" stdin and file scenarios into mmaps. They don't have read_line(). We can maybe implement a BufRead container for it? Or split into separate logic again?

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

nc7s commented 8 months ago

@CosmicHorrorDev:

  • I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

TIL std::io::Cursor ;) The reason of splitting being?

CosmicHorrorDev commented 8 months ago

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

Allowing for line buffering inputs opens the door for streaming stdin and stdout (e.g. someone running just sd can type lines and see the live output after each line). It's more of a conceptual thing since a lot of people think of text files by line

TIL std::io::Cursor ;) The reason of splitting being?

To support streaming reads from stdin. I was looking through ripgreps source and it looks like they still special case streaming stdin, so we'll likely have to too if we want that behavior (which I do)

nc7s commented 8 months ago

Did a little test on Cursor<Mmap>::lines(), it works. We can unify the logic (again?) on iterating through .lines().

(Excuse my "unificationism" XD)