Memoize transforms - Githubissues

NullVoxPopuli commented 7 years ago

string keys don't take up that much space.

say, an average worst case of 15 characters per key and all double-byte unicode,

15 * 16 = 240 bytes per key 1048576 bytes in a megabyte

1,048,576 / 240 = 4,369 keys per megabyte.

That should be a reasonable memory trade off, yeah?

Reason: memoizing the Ruby implementation almost gets to the un-memoized rust speed.

whatisinternet commented 7 years ago

Hey, I'm not sure if you're still going down this road. However, if you are and you're ok with mixing ruby with your rust you can use some of the memoization code from the ruby memoized implementation and get the following benchmarks:


Comparison:
    Rust Ruby: camel:   391152.8 i/s
         Rust: camel:   227260.3 i/s - 1.72x  (± 0.04) slower
Memoized Ruby: camel:   172196.6 i/s - 2.27x  (± 0.05) slower
         Ruby: camel:    10371.4 i/s - 37.72x  (± 1.94) slower
                   with 99.9% confidence

Comparison:
Rust Ruby: camel_lower:   384143.0 i/s
   Rust: camel_lower:   217323.1 i/s - 1.77x  (± 0.07) slower
Memoized Ruby: camel_lower:   138650.8 i/s - 2.77x  (± 0.26) slower
   Ruby: camel_lower:     6654.1 i/s - 57.71x  (± 7.56) slower
                   with 99.9% confidence

Comparison:
     Rust Ruby: dash:   380557.4 i/s
          Rust: dash:   215902.1 i/s - 1.76x  (± 0.08) slower
 Memoized Ruby: dash:   158880.2 i/s - 2.39x  (± 0.14) slower
          Ruby: dash:    18190.1 i/s - 20.92x  (± 1.41) slower
                   with 99.9% confidence

Comparison:
Memoized Ruby: unaltered:  5278571.3 i/s
Rust Ruby: unaltered:  5191279.0 i/s - same-ish: difference falls within error
     Ruby: unaltered:  5000077.6 i/s - 1.06x  (± 0.02) slower
     Rust: unaltered:  1025180.4 i/s - 5.15x  (± 0.09) slower
                   with 99.9% confidence

Comparison:
Rust Ruby: underscore:   389582.3 i/s
    Rust: underscore:   211500.8 i/s - 1.84x  (± 0.07) slower
Memoized Ruby: underscore:   140411.7 i/s - 2.77x  (± 0.25) slower
    Ruby: underscore:    99820.4 i/s - 3.90x  (± 0.49) slower
                   with 99.9% confidence

I also subbed out inflections for inflector for the above benchmarks since inflector is faster re: https://github.com/calebmer/inflections/issues/1

NullVoxPopuli commented 7 years ago

this is interesting. what's the code look like?

whatisinternet commented 7 years ago

case_transform.rb

#...
module CaseTransform
  class << self
    def camel_cache
      @camel_cache ||= {}
    end

    def camel_lower_cache
      @camel_lower_cache ||= {}
    end

    def dash_cache
      @dash_cache ||= {}
    end

    def underscore_cache
      @underscore_cache ||= {}
    end

    # Transforms values to UpperCamelCase or PascalCase.
    #
    # @example:
    #    "some_key" => "SomeKey",
    def camel_case(value)
      camel_cache[value] ||= camel(value)
    end

    # Transforms values to camelCase.
    #
    # @example:
    #    "some_key" => "someKey",
    def camel_case_lower(value)
      camel_lower_cache[value] ||= camel_lower(value)
    end

    # Transforms values to dashed-case.
    # This is the default case for the JsonApi adapter.
    #
    # @example:
    #    "some_key" => "some-key",
    def dash_case(value)
      dash_cache[value] ||= dash(value)
    end

    # Transforms values to underscore_case.
    # This is the default case for deserialization in the JsonApi adapter.
    #
    # @example:
    #    "some-key" => "some_key",
    def underscore_case(value)
      underscore_cache[value] ||= underscore(value)
    end

    def unaltered_ruby(value)
      value
    end
  end
end

lib.rs

//...
extern crate inflector;

use inflector::Inflector;
//...

cargo.toml

[package]
name = "case_transform"
version = "0.1.0"
authors = ["L. Preston Sego III <LPSego3+dev@gmail.com>"]
repository = "https://github.com/NullVoxPopuli/case_transform"

[package.metadata.thermite]
github_releases = true
github_release_type = "latest"

[lib]
name = "case_transform"
crate-type = ["dylib"]

[dependencies]
Inflector = "0.4.0"
ruru = "0.8.0"

NullVoxPopuli commented 7 years ago

oh, I see how you're doing this, I think.

I didn't know you could hook things up like this. legit.

I don't have time to implement this right now, but I'd gladly accept a PR.

A question or two though:

what happens when the cache gets large enough to not fit in L2 Cache? (I have no idea how many unique keys would need to exist for this)
would we want someone to opt-in to cache usage?

whatisinternet commented 7 years ago

Cool! I'll look into getting something properly together with tests etc. this week or next.

In terms of your questions:

Not sure haven't measured but probably should. I'll see what I can find for reliably measuring L2 cache :neutral_face:
Yeah, probably? It feels like something people should be permitted to opt out of since it could impact memory usage on a larger app.

d-unsed commented 7 years ago

@NullVoxPopuli, as I understand CaseTransorm will be used for JSON serialization, right? If it's correct, then the gem will mostly deal with hashes and their keys. Hash keys are unique, so memoization will benefit only if some nested hashes have the same keys as their parent hashes.

Maybe you have any stats about the data that will be passed to CaseTransform?

NullVoxPopuli commented 7 years ago

Hash keys are unique, so memoization will benefit only if some nested hashes have the same keys as their parent hash

correct, where this will probably benefit most is when a JSONAPI resource has multiple records (say 100 of the same model are returned to the JS client) -- all records will have the same keys / structure -- just different values

NullVoxPopuli / case_transform-rust-extensions

Memoize transforms #4