documentcloud / underscore-contrib

The brass buckles on Underscore's utility belt
MIT License
621 stars 118 forks source link

_.getPath and _.hasPath enhancements #175

Closed AlphaGit closed 9 years ago

AlphaGit commented 10 years ago

I needed _.getPath to support accessing arrays and I did not find it in the docs so I came here to add the feature, but when running the tests, I did find it. So I...

Unresolved concerns my code may generate:

joshuacc commented 10 years ago

@AlphaGit Thanks for the PR!

At first glance, I'm not a fan of the composite array/string paths. Do you have a particular use case that it helps with?

AlphaGit commented 10 years ago

Yes, I see it as the first step to implementing full path parsing, which would be natural for developers to write.

Example, if you need to reach out to the property: countries.greece.cities[0].name, the best scenario would be for you to just enclose that in double quotes and call _.getPath. With these changes we would allow for the first part, then we could add the array parsing.

Anyway, I just saw it handy and added it, but I don't this use case it a strong reason to have it. For most cases of dynamically accessing properties I think a list of array elements would be the simpler to construct. If you believe we should better not approach the composite path way, I can go ahead and take it out.

Thanks for the quick review! Awesome library, by the way.

megawac commented 10 years ago

I'd rather if it just supported full path parsing. I wrote a naive gist for that a while ago

https://gist.github.com/megawac/6162481#file-underscore-lookup-js

function getPath(obj, key) {
    var type = typeof key,
        i = 0,
        length;
    if (type == "string" || type == "number") {
        key = ("" + key).replace(/\[(.*?)\]/g, function (m, key) { //handle case where [1] or ['xa'] may occur
            return "." + key.replace(/^["']|["']$/g, ""); //strip quotes at the start or end of the key
        }).split(".");
    }
    for (length = key.length; i < length; i++) {
        if (_.has(obj, key[i])) 
            obj = obj[key[i]];
        else 
            return void 0;
    }
    return obj;
}

getPath({
  foo: { bar: 'hello, world!' }
}, 'foo["bar"]') // => 'hello, world'
joshuacc commented 10 years ago

Ah. I see. I would be super happy to have getPath and hasPath handle full path notation like countries.greece.cities[0].name. However, I don't think we should support composite paths. Should probably pull that out.

Now that I think about it, the regex for full path notation probably wouldn't be that complicated. If you're interested in adding that to this pull request, go for it!

Thanks for the quick review!

No problem.

Awesome library, by the way.

Most of the credit goes to @fogus . I just help out a bit. :smile:

joshuacc commented 10 years ago

Oooh. :sparkles: @megawac. That looks helpful.

AlphaGit commented 10 years ago

Thanks guys! I'll play with that a little bit now and update the PR in a few minutes.

@megawac If you don't mind, I may update your regex a little bit. I'm thinking of weird cases like { "D'artagnan": "yup" }, which as extreme as they sound, are valid.

Thanks!

megawac commented 10 years ago

Yep, it needs some work to handle quotes. E.g. getPath(foo[bar]) works as does getPath(foo[bar']) etc. Mainly wrote that as a proof of concept for a stackoverflow question ages ago

AlphaGit commented 10 years ago

After fiddling a little bit with the code I think I may go and use an iterative approach. I cannot seem to find a simple way to support weird combinations like { "I'm here to ruin[your.life]": ":)" }.

I know situations like those will be extremely unusual, but I think it would be better to have an edge-case free approach.

AlphaGit commented 10 years ago

Update since I'll need to get back to my daily work for a few hours at least (this was my lunchtime hehe): I've got it almost running, basically a common function parses the string character by character and identifying what needs to be accessed as a set of arrays. Then we just dive into the object to get those and stop once we reach a dead end.

This is how it looks so far, I know it's lengthy and would love to simplify, so I'm open to critique:

  // Internal functions that would not be exposed

  // Will take a path like 'element[0][1].subElement["Hey!.What?"]'
  // and return [element, 0, 1, subElement, "Hey!.What?"]
  function parseJavaScriptPathIntoKeyNames(javascriptPath) {
    var parts = [];
    var terminatorExpected = null;
    var insideIndexer = false;
    var currentPart = "";

    function flushCurrentPart() {
      if (currentPart.length > 0) {
        parts.push(currentPart);
        currentPart = "";
      }
    }

    for (var i = 0; i < javascriptPath.length; i++) {
      var currentChar = javascriptPath[i];
      switch (currentChar) {
      case "[":
        if (!terminatorExpected) {
          flushCurrentPart();

          terminatorExpected = ']';
          insideIndexer = true;
        } else {
          currentPart += currentChar;
        }
        break;
      case "]":
        if (terminatorExpected === "]") {
          flushCurrentPart();

          terminatorExpected = null;
          insideIndexer = false;
        } else {
          currentPart += currentChar;
        }
        break;
      case ".":
        if (!terminatorExpected) {
          flushCurrentPart();
        } else {
          currentPart += currentChar;
        }
        break;
      case "\'":
        if (!terminatorExpected) {
          terminatorExpected = "\'";
        } else if (terminatorExpected === "\'" && insideIndexer) {
          terminatorExpected = ']';
        } else if (terminatorExpected === "\'" && !insideIndexer) {
          flushCurrentPart();

          terminatorExpected = null;
        } else {
          currentPart += currentChar;
        }
        break;
      case "\"":
        if (!terminatorExpected) {
          terminatorExpected = "\"";
        } else if (terminatorExpected === "\"" && insideIndexer) {
          terminatorExpected = ']';
        } else if (terminatorExpected === "\"" && !insideIndexer) {
          flushCurrentPart();

          terminatorExpected = null;
        } else {
          currentPart += currentChar;
        }
        break;
      default:
        currentPart += currentChar;
      } // switch (currentChar)
    } // for

    flushCurrentPart();

    return parts;
  }

// ... 

    getPath: function getPath (obj, ks) {
      var keys = typeof ks === "string" ? parseJavaScriptPathIntoKeyNames(ks) : ks;
      console.log(keys);
      var keysCount = keys.length;
      var currentKeyIndex = 0;

      var currentKey;
      var currentObject = obj;
      do {
        currentKey = keys[currentKeyIndex++];        
        currentObject = currentObject[currentKey];
      } while (currentObject !== undefined && currentObject !== null && currentKeyIndex < keysCount);

      // We didn't finish traversing all properties, hence we 
      // couldn't because of inexisting descendants
      if (currentKeyIndex < keysCount) {
        return void 0;
      } else {
        return currentObject;
      }
    },
AlphaGit commented 10 years ago

PS, with that code, every test is passing except for this:

  var weirdObject = { "D'artagnan": { "[0].[1]": "myValue" } };

  strictEqual(_.getPath(weirdObject, "[\"D'artagnan\"]['[0].[1]']"), "myValue", "should be able to traverse complex property names, from an accessor string");

I know the problem is while parsing the property, but haven't figured out exactly what the issue is.

joshuacc commented 10 years ago

@AlphaGit Nice progress. I know you're not finished yet, but I see some opportunities for cleaning up the code. :grin:

megawac commented 10 years ago

By the way - take a look at this similar library and see if you can make their tests pass https://github.com/mike-marcacci/objectpath

On Wed, Sep 10, 2014 at 3:18 PM, Joshua Clanton notifications@github.com wrote:

@AlphaGit https://github.com/AlphaGit Nice progress. I know you're not finished yet, but I see some opportunities for cleaning up the code. [image: :grin:]

— Reply to this email directly or view it on GitHub https://github.com/documentcloud/underscore-contrib/pull/175#issuecomment-55168043 .

AlphaGit commented 10 years ago

Ok, so here it goes: I reverted my implementation of getPath. I wanted to make getPath and hasPath iterative because I thought it would be simpler to look at and would yield a better performance, but it ended up being highly unreadable. In case anyone's interested, here's an all-test passing getPath:

getPath: function() {
      var keysCount = keys.length;
      var currentKeyIndex = 0;

      var currentKey;
      var currentObject = obj;
      do {
        currentKey = keys[currentKeyIndex++];
        currentObject = currentObject[currentKey];
      } while (currentObject !== undefined && currentObject !== null && currentKeyIndex < keysCount);

      // We didn't finish traversing all properties, hence we 
      // couldn't because of inexisting descendants
      if (currentKeyIndex < keysCount) {
        return void 0;
      } else {
        return currentObject;
      }
}

When sending this to hasPath, nulls and undefined gave me lots of trouble, so I desisted.

Please shoot away for critique and improvement, I'll be happy to make those. (The parser function definitely needs some improvement.)

megawac commented 10 years ago

@AlphaGit for the iterative approach consider doing something like this. It wouldn't be difficult to make a generator that encompasses both hasPath and getPath.

function createPather(pathExists, pathFails) {
    return function path(paths, obj) {

        /* ... */

        return pathExists(obj[path]);
    }
}

var hasPath = createPather(_.constant(true), _.constant(false));
var getPath = createPather(_.identity, _.constant(void 0));
bjmiller commented 10 years ago

I wrote this some time ago for solving this particular problem: https://github.com/bjmiller/delve

It doesn't actually parse paths, but it might give you some ideas on ways to approach the problem.

AlphaGit commented 10 years ago

@megawac Thanks! That approach looks really neat but I don't know if we can use it here. The top-down approach using by parsing the string completely and not left to right assumes that there are definite separators that you can use.

Following that idea, you would start by dividing by periods and square brackets, so that a string like my.property.number[1] would first yield ['my', 'property', 'number[1]'] and then ['my', 'property', 'number', '1'].

However, the separator becomes more complex from a period when you need to consider periods as part of property names, like my.property.number['2.1'], so a split won't help. You can get away by using a regex that will only match periods not inside pairs of delimiters -- but then you need to also consider that these delimiters may not actually be delimiters if they are delimited by further delimiters. For example:my.property.number['3.".4'].

At this points, regexes are of no help, and the top-down approach can still be achieved through look-ahead inspection of the string. Basically, a left-to-right parser with some shortcuts.

With that train of thought in mind, I went straight ahead for the left-to-right parser, since I think it would be easier to work with the code. I can cut down some iterations over the string by doing an indexOf of the next delimiters ([, ], ', ", .) and then keep updating the state matching. But between the several indexOf operation, the Math.min to find the first one and the updates, and the string slicing for the next delimiters, I'm not sure how much we're gaining.

@bjmiller's approach allows for the currying that @megawac suggested, replacing lodash.curry (or is there something else I'm missing?)

Sorry for the wall of text guys. I definitely agree that it looks complex and should be simplified, but for the approaches suggested, I think it would end up creating more complexity than originally proposed. If any of my reasoning is wrong or missing something, please point it out. I'll be happy to apply the changes.

joshuacc commented 10 years ago

@AlphaGit After thinking about it some more, I don't think we need to prevent JS-invalid paths like foo.0 from working. We just need to make sure that proper JS syntax does work, and make sure that the docs only give valid examples. Don't know whether that will simplify your code any. (Hard to tell from my phone. :grin:)

AlphaGit commented 10 years ago

Hi guys, sorry for the delay in this. After requesting support from the community, someone suggested that I could use regexes with backreferences to match delimiters on each side of the expression, which allows me to avoid the manual parsing. The code ended up like this:

// Will take a path like 'element[0][1].subElement["Hey!.What?"]["[hey]"]'
// and return ["element", "0", "1", "subElement", "Hey!.What?", "[hey]"]
function parseJavaScriptPathIntoKeyNames(path) {
  // from http://codereview.stackexchange.com/a/63010/8176
  /**
   * Repeatedly capture either:
   * - a bracketed expression, discarding optional matching quotes inside, or
   * - an unbracketed expression, delimited by a dot or a bracket.
   */
  var re = /\[("|')(.+)\1\]|([^.\[\]]+)/g;

  var elements = [];
  var result;
  while ((result = re.exec(path)) !== null) {
    elements.push(result[2] || result[3]);
  }
  return elements;
}

I left the comments just for reference, but I can remove them if you prefer that. IMO, this looks a lot cleaner. Let me know what you think.

joshuacc commented 10 years ago

@AlphaGit Looks much nicer! Thanks for persevering through to a clean solution. Code review incoming. :grin:

joshuacc commented 10 years ago

@AlphaGit Need to update this line in the docs for _.getPath and _.hasPath:

Keys may be given as an array or as a dot-separated string.

Maybe this?

Keys may be given as an array of key names or as a single string using JavaScript property access notation.

AlphaGit commented 9 years ago

Agreed with everything. Will fix, push and notify.

AlphaGit commented 9 years ago

@joshuacc All set!

joshuacc commented 9 years ago

Looks good to me. Anything to add @megawac?

megawac commented 9 years ago

:ship: expose the string -> path?

AlphaGit commented 9 years ago

Hi @megawac, sorry I did not reply before. Did you want me to expose the string parsing function into the library?

joshuacc commented 9 years ago

@AlphaGit Let's hold off on that for the moment. I'll merge this and open up a new issue to discuss exposing the parsing function.

joshuacc commented 9 years ago

Thanks for the contribution @AlphaGit !

AlphaGit commented 9 years ago

My pleasure! Thanks guys for all the support.