adriank / ObjectPath

The agile query language for semi-structured data
http://objectpath.org
MIT License
380 stars 93 forks source link

wildcard function #29

Closed HyukjinKwon closed 9 years ago

HyukjinKwon commented 9 years ago

I just added a wildcard function for my personal use but I'd like to share it and apply it to your code (and modify it) if possible. It uses a findall function in python and replace the characters by the given regular expression.

Please look through this.:)

adriank commented 9 years ago

Hi,

Thanks for interest in ObjectPath!

I am accepting pull requests for JavaScript version of the library. Unfortunately for Python version it is imposible right now.

Regarding your code:

Why have you called it "wildcard function"?

ObjectPath is a query language that is intended to be easy to use for non-techies. I added functions the language because it was easy to do it. Nevertheless the language should provide an easier way to make things done. May I ask you how this particular function helps you? An use case would be greatly appreciated! In many cases the common to us programmers solutions can be very confusing for non-programmers.

In this particular example, the question might be:

Why would anyone need a list looking like this:

match("abababababa", "a")
["a","a","a","a","a","a","a"]
HyukjinKwon commented 9 years ago

Firstly my apologies for the lack of explanations. To cut it short I just wrote down my case below and this is why I called it wildcard function.

>>> from objectpath import *
>>> tree=Tree({"a": 3333, "b": 1112, "c": 1113})
>>> tree.execute("match('2.*',$.a)")
[]
>>> tree.execute("match('3.*',$.a)")
['3333']
>>>

My case is there were a number of short documents (from Solr(http://lucene.apache.org/solr/)) and I had to find out and extract only some documents, just like a number of rows in a database. Then, I found there are no such proper functions in this project and I just added a bit of codes.

I know there are absolutely obvious problems such as the return value type (it's a list in the above case) or that it takes all documents elements as a just plain string in some cases such as using $.* as a target.

I apologise that I did not implement it fully in order to work perfectly but I just wanted to give my thought and my approach.

>>> tree.execute('match("{\'a.*",$.*)')
["{'a': 3333, 'c': 1113, 'b': 1112}"]
>>>

In this case, I am now on an open project (not public yet) in a university level which is sort of a full text indexing search. In this case, I had to demonstrate quickly a POC. So, I took the documents as raw string and implemented them. Also, I understand that if this function is modified appropriately, there won't be such case 2.

Generally, you're right :). In my case, it could be pretty confusing non-programmers. But I think the non-programmers would think really it's handy if such functions are implemented.

adriank commented 9 years ago

Can you share input data snippet and the exact result that you would want to get from OP?

HyukjinKwon commented 9 years ago

I think that could be out of the range of this project topic since input query and output & input data were totally different. Also, it is difficult to give you its full spec as I just used it for a only demo purpose at that time.

Now I am not using it but implementing a query language with another open source by ANTLR. If you want to know about this project (and anything about the data or query), I would rather send a email as soon as the page of this becomes available.