Open iliana opened 11 years ago
My latest thinking on this is to create a module that does queries on a list of JSON objects with this format from MongoDB:
http://docs.mongodb.org/manual/tutorial/query-documents/ http://docs.mongodb.org/manual/reference/operator/nav-query/
Of course, these query documents can be described as JSON, which can be interpreted safely.
My latest thinking on this is to create a module that does queries on a list of JSON objects with this format from MongoDB
You and I started thinking about this at about the same time. I like the MongoDB format.. but there are other formats too. Some are XPath-like, etc. I think I like mongo's the best.
Whenever we settle on one, we could think about using it for both fedbadges
and fedmsg-notifications
to have a standard way to programmatically "criticize" fedmsg messages.
Can we just support ApacheMQ/JMS style queries? Or query by key and value pairs?
@akostadinov anything you would like to do that datagrepper doesn't already support?
At least I couldn't see from documentation how to query about whatever field I like. e.g.
{
"username": null,
"source_name": "datanommer",
"certificate": null,
"i": 0,
"timestamp": 1513540914.0,
"crypto": null,
"topic": "/topic/VirtualTopic.qe.ci.jenkins",
"headers": {
"CI_TYPE": "tier-0-testing-done",
"expires": "1514145714405",
"PRODUCT_TYPE": "treecompose",
"CI_STATUS": "passed",
"TEST_NAME": "improved-sanity-test",
"CI_USER": "atomic-e2e-jenkins",
"message-id": "ID:continuous-infra-jenkins.example.com-44193-1511460447792-91371:1:1:1:1",
"type": "application/json",
"PRODUCT_NAME": "fedora-27-atomic-updates",
"subscription": "/queue/Consumer.client-datanommer.openpaas-stage.VirtualTopic.aa.xy.jenkins"
},
"signature": null,
"source_version": "0.8.2",
"msg": {
"REFSPEC": "fedora/27/x86_64/updates/atomic-host",
"URL": "https://kojipkgs.fedoraproject.org/atomic/27/",
"CHECKSUM": "4dd60d7fda7cc9ee3325384f475648db4dd3a1232c35ce7c45e132710ada1d7c",
"IMAGE": "fedora/27/atomic",
"PRETTYPRODNAME": "Fedora 27 Atomic Updates",
"TESTOPTIONS": "none",
"SUBMAN": "prod"
}
}
How do I filter for TEST_NAME
or CI_TYPE
? Btw the JMS selectors allow for partial matching but not sure they allow random depth matching. Potentially partial matching would be useful when one has a a filed like: version: 3.2.14.5.6
but one cares about anything 3.2.x
.
You can use the contains
keyword for some level of matching.
@pypingou , could you provide an example because I couldn't figure out how to use it and I don't see it documented.
Do note that the contains
keyword is going to be severely limited and deprecated over time.
@puiterwijk , I see it is not working well. So far I'm only getting 502 trying to use contains
. I think something is timing out.
It would be great to support jsonpath expressions on the message bodies. That would provide maximum flexibility for querying based on the body's content, allowing users to query however they wanted.
This is how the fedmsg support works in Jenkins JMS messaging plugin - they allow arbitrary jsonpath expressions in order to filter fedmsg events on a topic. Like I use this to select "new Git tag" github2fedmsg messages for repositories I care about.
There's work underway to have Postgres support jsonpath natively, which would really help with performance and avoid the need to store the bodies in mongodb.
The initial plan for the /submit endpoint was to be able to submit advanced queries (such as "get all the git commits that changed a specfile" or "get wiki edits where users edited their own user pages").
Ideally, the user would be able to write a Python function that filtered message content for them. This is a security hell hole.
We could invent a language, but there's no way it could be complete enough. (I tried to use a couple different methods before giving up at Flock 2013.)