allenap / shell-quote

A Rust library for shell quoting strings, e.g. for interpolating into a Bash script. This is not as simple as most people think!
Apache License 2.0
5 stars 2 forks source link

Consider changing/adding API to work with `String` instead of `Vec` and `'a, I: Iterator<Item = &'a str>` instead of `&[&str]` #13

Closed mickvangelderen closed 3 months ago

mickvangelderen commented 4 months ago

You can unsafe { some_string.as_mut_vec() } if you guarantee that only valid UTF-8 sequences are written, but maybe you wanted to avoid unsafe altogether?

It would be nice to be able to write Bash::join(["some", "fixed", "args"].iter().chain(std::process::env::args().map(...))]) for example.

allenap commented 3 months ago

I didn't fully understand what you mean in the first paragraph. I'm not averse to unsafe, but can you explain where as_mut_vec() might be useful?

I think the join thing can be done with a function like the one below. It's not as concise as your example, but it does avoid lossy or fallible conversions from &OsStr/OsString to &str/String:

use std::{
    borrow::{Borrow, Cow},
    env::args_os,
    ffi::{OsStr, OsString},
};

use crate::{QuoteExt, Quoter};

pub fn args<Q: Quoter, S: AsRef<OsStr>>(q: Q, init: &[S]) -> OsString {
    init.iter()
        .map(AsRef::as_ref)
        .map(Cow::from)
        .chain(args_os().map(Cow::from))
        .fold(OsString::new(), |mut acc, item| {
            if !acc.is_empty() {
                acc.push(" ");
            }
            let item: &OsStr = item.borrow();
            acc.push_quoted(q, item);
            acc
        })
}

#[cfg(test)]
mod tests {
    use super::args;
    use crate::Bash;

    #[test]
    fn test_args() {
        let quoted = args(Bash, &["foo!bar", "1 2 & 3 4"]).into_string().unwrap();
        assert!(quoted.starts_with("$\'foo!bar\' $\'1 2 & 3 4\' "));
    }
}
mickvangelderen commented 3 months ago

I had not looked closely at the input argument before. From what I understand, Quotable represents anything that can be transformed into a sequence of bytes.

In using shell-quote, I was passing an &str as an input, and expected some T: Deref<&str> back. The input is valid utf-8, and I would like the output to still be valid utf-8.

If the input was always String, you could manipulate the bytes directly through String::as_mut_vec and, assuming you only write valid utf-8 sequences, return a String. I now know this is not the case (the input is not always a String.

Regardless, would it be an idea to add an associated type on Quotable::Output to determine for each quotable type what the output type should be?

Is the output always valid utf-8? In that case you could just return String or Cow<&str> to give the strongest guarantees for free.

About the iterators, I am curious why you wouldn't take an iterator instead of a slice? Do you need to iterate over the args more than once?

allenap commented 3 months ago

I kind of like what you're saying. I don't want to fix the output type with an associated type, but Quotable could capture that it's valid UTF-8 on the way in (distinct from pure binary data). The quoters could include that text with fewer escape sequences in many/most shells. At present, for Bash and fish, UTF-8 is treated as a byte stream, with high bytes (0x80..=0xff) being hex escaped; this is a big overhead for non-English text.

Stepping back a moment: I've had an unwritten goal to try and produce 7-bit ASCII. Recent changes to Sh have broken that because Dash has no escape syntax for high bytes. Pure binary data quoted/escaped by Sh must still be treated as (mostly) pure binary data, but the same data escaped by Bash or Fish is safe to treat as text.

I think 7-bit ASCII is still a useful goal. Escaped 7-bit ASCII is still going to be the most compatible form possible. But (optionally) allowing for valid UTF-8 sequences (code point U+0080 and beyond) to appear in quoted output verbatim would be useful.

About the iterators, I am curious why you wouldn't take an iterator instead of a slice? Do you need to iterate over the args more than once?

Seemed natural to me, that's all. Quotable could be extended to capture iterators too I suppose.

mickvangelderen commented 3 months ago

I don't want to fix the output type with an associated type, but Quotable could capture that it's valid UTF-8 on the way in (distinct from pure binary data).

Are you thinking of making Quotable an enum, or having distinct types QuotableBytes and QuotableString, or both? I have not thought through what the API implications are of either, but an enum could be nice in iterators or collections where the type needs to be homogeneous.

At present, for Bash and fish, UTF-8 is treated as a byte stream, with high bytes (0x80..=0xff) being hex escaped; this is a big overhead for non-English text.

In my derived implementation I can forward the high bytes since I only accept only &str as inputs and only have to care about Bash. I'm not sure how well supported that is though. It is not straightforward to design a consistent and good API that works on multiple platforms for multiple shells.

I think 7-bit ASCII is still a useful goal.

Strong guarantees on the output buffer type would be nice.

Seemed natural to me, that's all. Quotable could be extended to capture iterators too I suppose.

After writing some more code involving shell escaping, I am not sure how necessary iterator support is anymore. However, for functions that already take a slice and could take an iterator, I suppose it would not hurt.

allenap commented 3 months ago

Are you thinking of making Quotable an enum, or having distinct types QuotableBytes and QuotableString, or both? I have not thought through what the API implications are of either, but an enum could be nice in iterators or collections where the type needs to be homogeneous.

I went with making Quotable into an enum in #21 (since merged to master). I think it's fairly intuitive – or at least no less intuitive than before – but the effect of the input type on the output type is a bit opaque. I've documented it, but I think I may revisit it at some point before 1.0.

In my derived implementation I can forward the high bytes since I only accept only &str as inputs and only have to care about Bash. I'm not sure how well supported that is though. It is not straightforward to design a consistent and good API that works on multiple platforms for multiple shells.

The stuff in #21 should be effective for you now, i.e. it forwards the high bytes when they come in via &str or String.

mickvangelderen commented 3 months ago

Feel free to close this issue, it is not very well structured. Thanks for the changes so far!

allenap commented 3 months ago

Okay. I've filed #22 and #23 to track a couple of the things we discussed.

Thanks for the back and forth here. It's clarified my thinking on a lot of things, and made me realise (and fix) a mistake in Sh. Now I ought to release those fixes...