fitzgen / inlinable_string

An owned, grow-able UTF-8 string that stores small strings inline and avoids heap-allocation.
http://fitzgen.github.io/inlinable_string/inlinable_string/index.html
Other
69 stars 11 forks source link

Implement From<String> to avoid copying #4

Closed fitzgen closed 7 years ago

ArtemGr commented 8 years ago

For example?

pub struct InlString (InlinableString);
impl From<String> for InlString {
  fn from (s: String) -> InlString {
    if s.len() <= inlinable_string::inline_string::INLINE_STRING_CAPACITY {
      InlString (InlinableString::Inline (InlineString::from (&s[..])))
    } else {
      InlString (InlinableString::Heap (s))
    }
  }
}
fitzgen commented 8 years ago

Sort of.

We want impl From<String> for InlinableString.

ArtemGr commented 8 years ago

Here's another example:

impl postgres::types::FromSql for InlString {
  fn from_sql<R: Read> (ty: &postgres::types::Type, raw: &mut R, _: &postgres::types::SessionInfo) -> postgres::Result<InlString> {
    use postgres::error::Error; use postgres::types::Type;
    if *ty != Type::Varchar && *ty != Type::Text {
      return Err (Error::from (io::Error::new (std::io::ErrorKind::Other,
        format! ("InlString] Unknown PostgreSQL type: {:?}", ty))))}

    let mut stack = [0u8; 32];
    let got = try! (raw.read (&mut stack));
    if got <= inlinable_string::inline_string::INLINE_STRING_CAPACITY {
      let s = match from_utf8 (&stack[0..got]) {Ok (s) => s, Err (err) => return Err (Error::Conversion (box err))};
      Ok (InlString (InlinableString::Inline (InlineString::from (s))))
    } else {
      let mut buf = Vec::with_capacity (256);
      buf.extend_from_slice (&stack[0..got]);
      try! (raw.read_to_end (&mut buf));
      let s = match from_utf8 (&buf[..]) {Ok (s) => s, Err (err) => return Err (Error::Conversion (box err))};
      Ok (InlString (InlinableString::Heap (String::from (s))))}}

It might be made into a generic From<Read>.

ArtemGr commented 8 years ago

We want impl From for InlinableString.

That's impossible to test outside the inlinable_string crate, AFAIK. (You need the newtype to pin the external traits on external types).

fitzgen commented 8 years ago

This is the inlinable_string crate... I think we are talking past each other...

ArtemGr commented 8 years ago

Code examples I gave aren't from inlinable_string.

maciejhirsz commented 7 years ago

I should check issues before doing PRs. Just run into this, so #6.

I figured there is no point checking the lenght and doing an inline if you already have owned heap.

ArtemGr commented 7 years ago

I figured there is no point checking the lenght and doing an inline if you already have owned heap.

@maciejhirsz

On the contrary!

Using the heap means an extra random memory access!

Consider for (key, value) in btree_map {if key.starts_with ("f") {count += 1}}. If the key is InlinableString without a heap reference then we're doing sequential memory scan. Both the CPUs and the operating systems (with their virtual memory subsystems) are supposed to be able to optimize for this (with various level of prefetching).

Consider for s in vec {if s.starts_with ("f") {count += 1}}. No heap means approximately one memory fetch per two strings (assuming the usual 64-byte cache lines). Heap means approximately three memory fetches per two strings.

If you're using a heap reference than you're fetching random pieces from all over the heap. This is especially bad if some of your memory is swapped out.

Another example, a large long-living map. If the strings are stored inline then the memory pressure from that map is minimal (e.g. your program takes less memory). If the strings are on the heap then you're paying the full heap overhead (padding and metadata of the allocator).

maciejhirsz commented 7 years ago

Hah. That does actually make a lot of sense. I've updated the PR.

fitzgen commented 7 years ago

Fixed in #6

ArtemGr commented 7 years ago

@maciejhirsz Benchmark in https://github.com/rust-lang/rust/pull/40601#issue-215029505 might be interesting regarding how the heap fetches (e.g. sort_large_strings) fare against inline values (e.g. sort_large_big_random), methinks.

ArtemGr commented 7 years ago

Might be a useful template if someone wants to experiment with benchmarking: https://gist.github.com/ArtemGr/cecca70d27178a1f1210df7bd53cf167

fitzgen commented 7 years ago

@ArtemGr are you interested in co-maintaining this crate?

ArtemGr commented 7 years ago

@fitzgen Yes, I wouldn't mind joining in.

fitzgen commented 7 years ago

@ArtemGr Great, thanks!

How does this sound for process: we lock down the master branch and require an approval review (presumably from one of us) and passing tests for anything to land on it?

ArtemGr commented 7 years ago

@fitzgen NP! I never worked with protected branches before, so... sure, looks like an interesting thing to try, at the very least.

fitzgen commented 7 years ago

@ArtemGr cool! Invite sent.