fedora-infra / datagrepper

HTTP API for datanommer and the fedmsg bus
https://apps.fedoraproject.org/datagrepper/
GNU General Public License v2.0
43 stars 34 forks source link

Determine way to make advanced queries #55

Open iliana opened 11 years ago

iliana commented 11 years ago

The initial plan for the /submit endpoint was to be able to submit advanced queries (such as "get all the git commits that changed a specfile" or "get wiki edits where users edited their own user pages").

Ideally, the user would be able to write a Python function that filtered message content for them. This is a security hell hole.

We could invent a language, but there's no way it could be complete enough. (I tried to use a couple different methods before giving up at Flock 2013.)

iliana commented 11 years ago

My latest thinking on this is to create a module that does queries on a list of JSON objects with this format from MongoDB:

http://docs.mongodb.org/manual/tutorial/query-documents/ http://docs.mongodb.org/manual/reference/operator/nav-query/

Of course, these query documents can be described as JSON, which can be interpreted safely.

ralphbean commented 11 years ago

My latest thinking on this is to create a module that does queries on a list of JSON objects with this format from MongoDB

You and I started thinking about this at about the same time. I like the MongoDB format.. but there are other formats too. Some are XPath-like, etc. I think I like mongo's the best.

Whenever we settle on one, we could think about using it for both fedbadges and fedmsg-notifications to have a standard way to programmatically "criticize" fedmsg messages.

akostadinov commented 6 years ago

Can we just support ApacheMQ/JMS style queries? Or query by key and value pairs?

pypingou commented 6 years ago

@akostadinov anything you would like to do that datagrepper doesn't already support?

akostadinov commented 6 years ago

At least I couldn't see from documentation how to query about whatever field I like. e.g.

 {
  "username": null, 
  "source_name": "datanommer", 
  "certificate": null, 
  "i": 0, 
  "timestamp": 1513540914.0, 
  "crypto": null, 
  "topic": "/topic/VirtualTopic.qe.ci.jenkins", 
  "headers": {
    "CI_TYPE": "tier-0-testing-done", 
    "expires": "1514145714405", 
    "PRODUCT_TYPE": "treecompose", 
    "CI_STATUS": "passed", 
    "TEST_NAME": "improved-sanity-test", 
    "CI_USER": "atomic-e2e-jenkins", 
    "message-id": "ID:continuous-infra-jenkins.example.com-44193-1511460447792-91371:1:1:1:1", 
    "type": "application/json", 
    "PRODUCT_NAME": "fedora-27-atomic-updates", 
    "subscription": "/queue/Consumer.client-datanommer.openpaas-stage.VirtualTopic.aa.xy.jenkins"
  }, 
  "signature": null, 
  "source_version": "0.8.2", 
  "msg": {
    "REFSPEC": "fedora/27/x86_64/updates/atomic-host", 
    "URL": "https://kojipkgs.fedoraproject.org/atomic/27/", 
    "CHECKSUM": "4dd60d7fda7cc9ee3325384f475648db4dd3a1232c35ce7c45e132710ada1d7c", 
    "IMAGE": "fedora/27/atomic", 
    "PRETTYPRODNAME": "Fedora 27 Atomic Updates", 
    "TESTOPTIONS": "none", 
    "SUBMAN": "prod"
  }
}

How do I filter for TEST_NAME or CI_TYPE? Btw the JMS selectors allow for partial matching but not sure they allow random depth matching. Potentially partial matching would be useful when one has a a filed like: version: 3.2.14.5.6 but one cares about anything 3.2.x.

pypingou commented 6 years ago

You can use the contains keyword for some level of matching.

akostadinov commented 6 years ago

@pypingou , could you provide an example because I couldn't figure out how to use it and I don't see it documented.

pypingou commented 6 years ago

This one works for me https://apps.fedoraproject.org/datagrepper/raw?category=pagure&user=pingou&contains=assigned&delta=36000

puiterwijk commented 6 years ago

Do note that the contains keyword is going to be severely limited and deprecated over time.

akostadinov commented 6 years ago

@puiterwijk , I see it is not working well. So far I'm only getting 502 trying to use contains. I think something is timing out.

ktdreyer commented 6 years ago

It would be great to support jsonpath expressions on the message bodies. That would provide maximum flexibility for querying based on the body's content, allowing users to query however they wanted.

This is how the fedmsg support works in Jenkins JMS messaging plugin - they allow arbitrary jsonpath expressions in order to filter fedmsg events on a topic. Like I use this to select "new Git tag" github2fedmsg messages for repositories I care about.

There's work underway to have Postgres support jsonpath natively, which would really help with performance and avoid the need to store the bodies in mongodb.