Closed AlphaGit closed 9 years ago
@AlphaGit Thanks for the PR!
At first glance, I'm not a fan of the composite array/string paths. Do you have a particular use case that it helps with?
Yes, I see it as the first step to implementing full path parsing, which would be natural for developers to write.
Example, if you need to reach out to the property: countries.greece.cities[0].name, the best scenario would be for you to just enclose that in double quotes and call _.getPath
. With these changes we would allow for the first part, then we could add the array parsing.
Anyway, I just saw it handy and added it, but I don't this use case it a strong reason to have it. For most cases of dynamically accessing properties I think a list of array elements would be the simpler to construct. If you believe we should better not approach the composite path way, I can go ahead and take it out.
Thanks for the quick review! Awesome library, by the way.
I'd rather if it just supported full path parsing. I wrote a naive gist for that a while ago
https://gist.github.com/megawac/6162481#file-underscore-lookup-js
function getPath(obj, key) {
var type = typeof key,
i = 0,
length;
if (type == "string" || type == "number") {
key = ("" + key).replace(/\[(.*?)\]/g, function (m, key) { //handle case where [1] or ['xa'] may occur
return "." + key.replace(/^["']|["']$/g, ""); //strip quotes at the start or end of the key
}).split(".");
}
for (length = key.length; i < length; i++) {
if (_.has(obj, key[i]))
obj = obj[key[i]];
else
return void 0;
}
return obj;
}
getPath({
foo: { bar: 'hello, world!' }
}, 'foo["bar"]') // => 'hello, world'
Ah. I see. I would be super happy to have getPath
and hasPath
handle full path notation like countries.greece.cities[0].name
. However, I don't think we should support composite paths. Should probably pull that out.
Now that I think about it, the regex for full path notation probably wouldn't be that complicated. If you're interested in adding that to this pull request, go for it!
Thanks for the quick review!
No problem.
Awesome library, by the way.
Most of the credit goes to @fogus . I just help out a bit. :smile:
Oooh. :sparkles: @megawac. That looks helpful.
Thanks guys! I'll play with that a little bit now and update the PR in a few minutes.
@megawac If you don't mind, I may update your regex a little bit. I'm thinking of weird cases like { "D'artagnan": "yup" }, which as extreme as they sound, are valid.
Thanks!
Yep, it needs some work to handle quotes. E.g. getPath(foo[bar])
works as
does getPath(foo[bar'])
etc. Mainly wrote that as a proof of concept for
a stackoverflow question ages ago
After fiddling a little bit with the code I think I may go and use an iterative approach. I cannot seem to find a simple way to support weird combinations like { "I'm here to ruin[your.life]": ":)" }
.
I know situations like those will be extremely unusual, but I think it would be better to have an edge-case free approach.
Update since I'll need to get back to my daily work for a few hours at least (this was my lunchtime hehe): I've got it almost running, basically a common function parses the string character by character and identifying what needs to be accessed as a set of arrays. Then we just dive into the object to get those and stop once we reach a dead end.
This is how it looks so far, I know it's lengthy and would love to simplify, so I'm open to critique:
// Internal functions that would not be exposed
// Will take a path like 'element[0][1].subElement["Hey!.What?"]'
// and return [element, 0, 1, subElement, "Hey!.What?"]
function parseJavaScriptPathIntoKeyNames(javascriptPath) {
var parts = [];
var terminatorExpected = null;
var insideIndexer = false;
var currentPart = "";
function flushCurrentPart() {
if (currentPart.length > 0) {
parts.push(currentPart);
currentPart = "";
}
}
for (var i = 0; i < javascriptPath.length; i++) {
var currentChar = javascriptPath[i];
switch (currentChar) {
case "[":
if (!terminatorExpected) {
flushCurrentPart();
terminatorExpected = ']';
insideIndexer = true;
} else {
currentPart += currentChar;
}
break;
case "]":
if (terminatorExpected === "]") {
flushCurrentPart();
terminatorExpected = null;
insideIndexer = false;
} else {
currentPart += currentChar;
}
break;
case ".":
if (!terminatorExpected) {
flushCurrentPart();
} else {
currentPart += currentChar;
}
break;
case "\'":
if (!terminatorExpected) {
terminatorExpected = "\'";
} else if (terminatorExpected === "\'" && insideIndexer) {
terminatorExpected = ']';
} else if (terminatorExpected === "\'" && !insideIndexer) {
flushCurrentPart();
terminatorExpected = null;
} else {
currentPart += currentChar;
}
break;
case "\"":
if (!terminatorExpected) {
terminatorExpected = "\"";
} else if (terminatorExpected === "\"" && insideIndexer) {
terminatorExpected = ']';
} else if (terminatorExpected === "\"" && !insideIndexer) {
flushCurrentPart();
terminatorExpected = null;
} else {
currentPart += currentChar;
}
break;
default:
currentPart += currentChar;
} // switch (currentChar)
} // for
flushCurrentPart();
return parts;
}
// ...
getPath: function getPath (obj, ks) {
var keys = typeof ks === "string" ? parseJavaScriptPathIntoKeyNames(ks) : ks;
console.log(keys);
var keysCount = keys.length;
var currentKeyIndex = 0;
var currentKey;
var currentObject = obj;
do {
currentKey = keys[currentKeyIndex++];
currentObject = currentObject[currentKey];
} while (currentObject !== undefined && currentObject !== null && currentKeyIndex < keysCount);
// We didn't finish traversing all properties, hence we
// couldn't because of inexisting descendants
if (currentKeyIndex < keysCount) {
return void 0;
} else {
return currentObject;
}
},
PS, with that code, every test is passing except for this:
var weirdObject = { "D'artagnan": { "[0].[1]": "myValue" } };
strictEqual(_.getPath(weirdObject, "[\"D'artagnan\"]['[0].[1]']"), "myValue", "should be able to traverse complex property names, from an accessor string");
I know the problem is while parsing the property, but haven't figured out exactly what the issue is.
@AlphaGit Nice progress. I know you're not finished yet, but I see some opportunities for cleaning up the code. :grin:
By the way - take a look at this similar library and see if you can make their tests pass https://github.com/mike-marcacci/objectpath
On Wed, Sep 10, 2014 at 3:18 PM, Joshua Clanton notifications@github.com wrote:
@AlphaGit https://github.com/AlphaGit Nice progress. I know you're not finished yet, but I see some opportunities for cleaning up the code. [image: :grin:]
— Reply to this email directly or view it on GitHub https://github.com/documentcloud/underscore-contrib/pull/175#issuecomment-55168043 .
Ok, so here it goes: I reverted my implementation of getPath. I wanted to make getPath and hasPath iterative because I thought it would be simpler to look at and would yield a better performance, but it ended up being highly unreadable. In case anyone's interested, here's an all-test passing getPath:
getPath: function() {
var keysCount = keys.length;
var currentKeyIndex = 0;
var currentKey;
var currentObject = obj;
do {
currentKey = keys[currentKeyIndex++];
currentObject = currentObject[currentKey];
} while (currentObject !== undefined && currentObject !== null && currentKeyIndex < keysCount);
// We didn't finish traversing all properties, hence we
// couldn't because of inexisting descendants
if (currentKeyIndex < keysCount) {
return void 0;
} else {
return currentObject;
}
}
When sending this to hasPath, nulls and undefined gave me lots of trouble, so I desisted.
Please shoot away for critique and improvement, I'll be happy to make those. (The parser function definitely needs some improvement.)
@AlphaGit for the iterative approach consider doing something like this. It wouldn't be difficult to make a generator that encompasses both hasPath
and getPath
.
function createPather(pathExists, pathFails) {
return function path(paths, obj) {
/* ... */
return pathExists(obj[path]);
}
}
var hasPath = createPather(_.constant(true), _.constant(false));
var getPath = createPather(_.identity, _.constant(void 0));
I wrote this some time ago for solving this particular problem: https://github.com/bjmiller/delve
It doesn't actually parse paths, but it might give you some ideas on ways to approach the problem.
@megawac Thanks! That approach looks really neat but I don't know if we can use it here. The top-down approach using by parsing the string completely and not left to right assumes that there are definite separators that you can use.
Following that idea, you would start by dividing by periods and square brackets, so that a string like my.property.number[1]
would first yield ['my', 'property', 'number[1]']
and then ['my', 'property', 'number', '1']
.
However, the separator becomes more complex from a period when you need to consider periods as part of property names, like my.property.number['2.1']
, so a split won't help. You can get away by using a regex that will only match periods not inside pairs of delimiters -- but then you need to also consider that these delimiters may not actually be delimiters if they are delimited by further delimiters. For example:my.property.number['3.".4']
.
At this points, regexes are of no help, and the top-down approach can still be achieved through look-ahead inspection of the string. Basically, a left-to-right parser with some shortcuts.
With that train of thought in mind, I went straight ahead for the left-to-right parser, since I think it would be easier to work with the code. I can cut down some iterations over the string by doing an indexOf of the next delimiters ([
, ]
, '
, "
, .
) and then keep updating the state matching. But between the several indexOf operation, the Math.min to find the first one and the updates, and the string slicing for the next delimiters, I'm not sure how much we're gaining.
@bjmiller's approach allows for the currying that @megawac suggested, replacing lodash.curry
(or is there something else I'm missing?)
Sorry for the wall of text guys. I definitely agree that it looks complex and should be simplified, but for the approaches suggested, I think it would end up creating more complexity than originally proposed. If any of my reasoning is wrong or missing something, please point it out. I'll be happy to apply the changes.
@AlphaGit After thinking about it some more, I don't think we need to prevent JS-invalid paths like foo.0
from working. We just need to make sure that proper JS syntax does work, and make sure that the docs only give valid examples. Don't know whether that will simplify your code any. (Hard to tell from my phone. :grin:)
Hi guys, sorry for the delay in this. After requesting support from the community, someone suggested that I could use regexes with backreferences to match delimiters on each side of the expression, which allows me to avoid the manual parsing. The code ended up like this:
// Will take a path like 'element[0][1].subElement["Hey!.What?"]["[hey]"]'
// and return ["element", "0", "1", "subElement", "Hey!.What?", "[hey]"]
function parseJavaScriptPathIntoKeyNames(path) {
// from http://codereview.stackexchange.com/a/63010/8176
/**
* Repeatedly capture either:
* - a bracketed expression, discarding optional matching quotes inside, or
* - an unbracketed expression, delimited by a dot or a bracket.
*/
var re = /\[("|')(.+)\1\]|([^.\[\]]+)/g;
var elements = [];
var result;
while ((result = re.exec(path)) !== null) {
elements.push(result[2] || result[3]);
}
return elements;
}
I left the comments just for reference, but I can remove them if you prefer that. IMO, this looks a lot cleaner. Let me know what you think.
@AlphaGit Looks much nicer! Thanks for persevering through to a clean solution. Code review incoming. :grin:
@AlphaGit Need to update this line in the docs for _.getPath
and _.hasPath
:
Keys may be given as an array or as a dot-separated string.
Maybe this?
Keys may be given as an array of key names or as a single string using JavaScript property access notation.
Agreed with everything. Will fix, push and notify.
@joshuacc All set!
Looks good to me. Anything to add @megawac?
:ship: expose the string -> path?
Hi @megawac, sorry I did not reply before. Did you want me to expose the string parsing function into the library?
@AlphaGit Let's hold off on that for the moment. I'll merge this and open up a new issue to discuss exposing the parsing function.
Thanks for the contribution @AlphaGit !
My pleasure! Thanks guys for all the support.
I needed _.getPath to support accessing arrays and I did not find it in the docs so I came here to add the feature, but when running the tests, I did find it. So I...
Unresolved concerns my code may generate: