gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
302 stars 85 forks source link

Cannot group by without specifying resource type #9

Open jd opened 7 years ago

jd commented 7 years ago

See https://bugs.launchpad.net/gnocchi/+bug/1619412

aalvrz commented 7 years ago

I am thinking that if we do not need to specify the resource type when querying, this should be implemented in a new endpoint?

jd commented 7 years ago

@BigChief45 Either that or using /v1/generic/…. Since it does not work for now with /v1/generic and it would work in the future, I don't see this is a breaking the API or whatever so that sounds good enough. Or do you see something else?

aalvrz commented 7 years ago

Hm from what I understand, the querying should happen during aggregation. So at the very least something like /v1/aggregation/...?

Then the fields to group by: ?groupby=...

So in the end, something like:

/v1/aggregation/.../metric/%s?groupby=...

So I am thinking that we only need to search for the common fields (name and type) that are given, across all resource types, and use those resources?

scolinas commented 7 years ago

Hi @BigChief45, we're really interested in this feature, it will allow us to make complex querys that now requires lots of api calls and post processing. I think you are right, you only need to search for the common fields (name and type) that are given, across all resource types, and use those resources. The only issue that i can see is what if you find two fields with the same name but different types, not my case but could happen.

aalvrz commented 7 years ago

@scolinas could you give me some expected sample output for this feature?

aalvrz commented 7 years ago

So I started playing around with this idea.

Basically I am searching resources using generic and details=True since this way I can get all resources in a single request to start comparing the fields. However the query inside the request body raises an indexer.ResourceAttributeError when doing the search. Since the generic fields do not have the fields we use for querying.

I am still thinking on how to handle this from the endpoint, but maybe the _search() method from SearchResourceTypeController would probably need to be changed as well to make this work?

jd commented 7 years ago

@BigChief45 I did not look into the details, but the _search method you are referring to leverages the list_resources from the indexer, which takes a resource type as a first mandatory argument. That allows it to filter on any column in the table of the resource.

So a simple approach is to allow the API to receive a list of resource type, and call list_resources for each resource type, with the attributes filter.

A more advanced optimization (that could be in a second patch/PR) would be to modify list_resources to get not ONE type of resources, but multiple. It'll then need to check that each attribute used in filtering is of the same type (you don't want to compare VARCHAR and BIGINT from 2 different tables just because they are called "foobar"). You could then do something like

SELECT * FROM instance WHERE somefield = 'foobar'
UNION
SELECT * FROM volume WHERE somefield = 'foobar'

And get everything back in one SQL query.

aalvrz commented 7 years ago

So a simple approach is to allow the API to receive a list of resource type, and call list_resources for each resource type, with the attributes filter.

Would this list be sent in a similar way to the groupby list?

Edit: I thought this feature was about only specifying the fields, but not any resource types.

jd commented 7 years ago

Would this list would be sent in a similar way to the groupby list?

I don't think you can get it from the URL path anymore, so it'd be something like GET /someurl?instance_type=foo&instance_type=bar so you can get an array from the query arguments.

@BigChief45 It's really about what the title says :)

scolinas commented 7 years ago

Hi @BigChief45, sorry for the late answer. For example, if you have a metric (total.cost) which exists on different resource types (instance, instance_disk, instance_network_interface) and some common metadata fields (department, application). You'll probably want to get total.cost grouped by department (or application), but not only for one resource type. So you don't want to specify the resource type because you want to get the sum (or any other aggregation) of all of them, but you cannot use generic because generic does not have those fields. What jd says (use a list of resource types instead of only one) can be a simplified version of this feature but it has the drawback that you need to known exactly all the resource types beforehand, which not always is possible.