closeio / flask-mongorest

Restful API framework wrapped around MongoEngine
Other
523 stars 88 forks source link

Dealing with embedded documents. #103

Closed KeironO closed 7 years ago

KeironO commented 7 years ago

Hey there,

I'm new to MongoDB and I would like to know how I can access an embedded document using flask-mongorest. I know how to do it in the cli, but I can't seem to find documentation here.

Example

Given an output of...


    "data": [
        {
            "adducts": {
                "Anion": {
                    "[M-H]1-": [
                        [
                            349.2093240735, 
                            100.0
                        ], 
                        [
                            350.2126789113, 
                            21.631456585464488
                        ]
                    ]
                }, 
                "Canion": {}, 
                "Nominal": [
                    [
                        350.2093240735, 
                        100.0
                    ], 
                    [
                        351.2126789113, 
                        21.631456585464488
                    ]
                ]
            }, 
            "id": "586bf20b9f0029837dfc9d39", 
            "molecular_formula": "C20H30O5", 
            "name": "Oryzalic acid B", 
            "origins": [
                "Endogenous", 
                "Food"
            ]
        }...

I'd like to filter out anything that has an "anion" from "adducts" from a given value compared to the first element the first list in that given key.

Is this possible in flask-mongorest?

Thanks,

Keiron.

wojcikstefan commented 7 years ago

Hi @KeironO! Before we get to the main part, I don't think you'll be able to map "[M-H]1-" to a field in MongoEngine (which Flask-MongoRest uses under the hood).

Is this possible in flask-mongorest?

All sorts of filtering is possible - you can either override Resource.get_queryset or Resource.get_objects to get a result set that you need. That being said, I don't really understand what you're trying to do exactly, so...

I'd like to filter out anything that has an "anion" from "adducts" from a given value compared to the first element the first list in that given key.

Could you elaborate on this point? Maybe give an example to make it clearer? What does the equivalent filter in MongoDB shell look like and what would you like the API request to look like?

KeironO commented 7 years ago

Would probably be easier to give a simple example of a small subset of what I plan to do, and hopefully I'll be able to pick it up from there.

I'd like to use a ResourceView class utilising @api.register().. to get all documents where Adduct.canion has a length of keys > 0.

KeironO commented 7 years ago

I do apologise, I am new to MongoDB so the learning curve has been a little bit steep.

I've now implemented a "length" variable and am now able to do what I want using the following command.

db.metabolites.find({"adducts.Anion.length" : {$gt : 0}})

However, how can I implement this using the api module?

So for example, querying...

/example/?adduct__equals=Anion

Will return everything in...

db.metabolites.find({"adducts.Anion.length" : {$gt : 0}})

Thanks,

Keiron.

KeironO commented 7 years ago

After a little bit of thought (and reading), I realised that I have problems with my schema. I have since changed it to this.

[
    {
        "accurate_mass": 350.45000749099137, 
        "smiles": "CC(C)(C1CCC23CC(CCC2C1(C)CC(O)=O)C(=C)C3O)C(O)=O", 
        "isotopic_distributions": [
            [
                0.0, 
                100.0
            ], 
            [
                1.003354837799975, 
                21.631456585464488
            ]
        ], 
        "name": "Oryzalic acid B", 
        "origins": [
            "Endogenous", 
            "Food"
        ], 
        "molecular_formula": "C20H30O5", 
        "adduct_weights": {
            "positive": {
                "count": 0, 
                "peaks": []
            }, 
            "neutral": 350.2093240735, 
            "negative": {
                "count": 1, 
                "peaks": [
                    [
                        "[M-H]1-", 
                        349.2093240735
                    ]
                ]
            }
        }
    },...
]

With this, I've been able to implement a modulo operator using the _lt and _gt operators provided through flask-mongorest (example below).

?origins__contains=Endogenous&accurate_mass__lt=1400&accurate_mass__gt=1399

However, this isn't really ideal - I'd ideally like to have a operator named "find" which takes in two values, a tolerance level (a unit of tolerance between the divisor and remainder which will be calculated server side) and a base value.

I've looked around for documentation and can't seem to find any that comprehensively outlines how to set up your own "operator". Any pointers/toy examples would be much appreciated!

wojcikstefan commented 7 years ago

@KeironO is there a mongodb query that can do perform this type of filtering or would you have to fetch all results and only then apply the filtering in Python? If it's the former, it should be fairly easy to subclass the default Operator from https://github.com/closeio/flask-mongorest/blob/master/flask_mongorest/operators.py#L1.

KeironO commented 7 years ago

@wojcikstefan Thank you for taking the time to respond to my queries.

The MongoDB query would look something like this.

db.metabolites.find({"accurate_mass" : { $mod : [1000,998]} })

But ideally I'd like to take in two values - one representing a base value and some sort of tolerance level which will be calculated server. Below is a toy example.

/?accurate_mass__find=999,1

Should provide the same output as...

db.metabolites.find({"accurate_mass" : { $mod : [a, b]} })

Where a = 999 +1 and b = 999 - 1 (taken from the query).

KeironO commented 7 years ago

Maybe find would be a bad example of a filter in this instance, but I hope you get the point! :)

wojcikstefan commented 7 years ago

Haven't tested it, but something like this might work:

from flask_mongorest.operators import Operator

class Mod(Operator):
    op = 'mod'

    def prepare_queryset_kwargs(self, field, value, negate):
        divisor, remainder = value.split(',')
        return {'{}__{}'.format(field, self.op): [divisor, remainder]}

and then in your resource you specify:

filters = {
        'accurate_mass': [Mod],
}
KeironO commented 7 years ago

Thank you again for responding, but this doesn't seem to work(!)

I attempted something not too dissimilar earlier on, and it failed to work then.

Results from ?accurate_mass__mod=200,199 are as follows:

 {
            "accurate_mass": 399.43791648750044, 
            "id": "586ea4419c4fa65248676a43", 
            "molecular_formula": "C22H25NO6", 
            "name": "Colchicine", 
            "origins": [
                "Drug"
            ]
        },...

Which is obviously nowhere near those two values(!)

KeironO commented 7 years ago

Apologies @wojcikstefan, I seem to have misunderstood the meaning of modular - I probably should have listened more in

The 'correct' MongoDB command now looks like the following.

db.metabolites.find({$and : [{"accurate_mass" : {$gt : 999}}, {"accurate_mass" : {$lt : 1010}}]})

or even (simpler)...

db.metabolites.find({"accurate_mass" : {$gt : 999, $lt : 1010}})

How can I best create an Operator to perform said operation?

wojcikstefan commented 7 years ago

@KeironO you don't need to create a new opeator - should be enough to send an API request with ?accurate_mass__gt=999&accurate_mass__lt=1010 (of course after adding ops.Gt, ops.Lt to your resource's filters).

KeironO commented 7 years ago

Hey there,

I guess the point of the question was to do it so there is no need to enter both arguments, and to better understand how to use the operator class going forward.

On 5 Jan 2017 11:54 p.m., "Stefan Wójcik" notifications@github.com wrote:

@KeironO https://github.com/KeironO you don't need to create a new opeator - should be enough to send an API request with ?accurate_mass__gt=999&accurate_mass__lt=1010 (of course after adding ops.Gt, ops.Lt to your resource's filters.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/closeio/flask-mongorest/issues/103#issuecomment-270792674, or mute the thread https://github.com/notifications/unsubscribe-auth/AELu00LOGnuX7pB9-PQwsEIz2uyD8kPZks5rPYK9gaJpZM4LapEw .

wojcikstefan commented 7 years ago

@KeironO I added an explanation in https://github.com/closeio/flask-mongorest/pull/104 using an example similar to yours. Take a look and let me know if it makes it clearer.

KeironO commented 7 years ago

@wojcikstefan Thank you for doing that, but now I'm getting a InvalidQueryError.

127.0.0.1 - - [06/Jan/2017 09:00:20] "GET /metabolites/?name__contains=Ganglioside&accurate_mass__range=1000,999 HTTP/1.1" 500 -
(...)
InvalidQueryError: Cannot resolve subfield or operator range on the field accurate_mass
KeironO commented 7 years ago

Huh, well that's strange. I've just stumped upon this.

If you enter the query:

/?accurate_mass=300,300

It works, without a hitch. However, entering /?accurate_mass__ppm=300,300 gives the same error as above.

Source code below.

class Ppm(Operator):
    def prepare_queryset_kwargs(self, field, value, negate=False):
        mz, ppm_threshold = [float(x) for x in value.split(',')]

        difference = abs(mz * (ppm_threshold * 0.0001))  # PPM to diff.
        return {
            field + '__gt': mz-difference,
            field + '__lt': mz+difference
        }

Edit, so you need to set the op in order to gave the operator a name(!)

wojcikstefan commented 7 years ago

Ah, I updated the PR. Of course you need the op attribute, otherwise Flask-MongoRest doesn't know that __ppm in the querystring should refer to the Ppm Operator :) See #104 again and add op = 'ppm' to your class above.

KeironO commented 7 years ago

I really appreciate your assistance, but I'm still confused in regards to working with "child" documents - should I start a new issue or continue with my queries here?

wojcikstefan commented 7 years ago

This issue it's getting a bit lenghty, so as soon as we nail the operator here, I'd open a new one with a concise description of the child documents, their schema, the API request you want to use, etc.

KeironO commented 7 years ago

I think it's closely linked to the Operator class, so I'll post it here.

Given data like this (notice the "adduct_weights" : {} )...

"data": [
        {
            "adduct_weights": {
                "negative": {
                    "count": 0, 
                    "peaks": []
                }, 
                "neutral": 136.1252005136, 
                "positive": {
                    "count": 0, 
                    "peaks": []
                }
            }, 
            "id": "586ea4419c4fa6524867693f", 
            "name": "(+)-a-Pinene", 
            "origins": [
                "Endogenous", 
                "Food"
            ]
        }, 

I'd like to check if the value of "adduct_weights.negative.count" > 0. The MongoDB command to do this can be written as.

db.getCollection('metabolites').find({"adduct_weights.negative.count" : {$gt : 0}})

How can I write an operator to access this child document?

(I'm not sure if it's important, but here's my Document class.)

class MetaboliteAdduct(db.DynamicDocument):
    meta = {"collection": "metabolites"}
    name = db.StringField()
    origins = db.ListField(db.StringField())
    adduct_weights = db.DictField()
wojcikstefan commented 7 years ago

Use of DictField is discouraged, because this way flask-mongoengine doesn't know what type to coerce the request param into and treats 0 as '0' (a string), because without any type hints, that's essentially what it is. Try to turn adduct_weights into an embedded document and then negative and positive into embedded documents of that embedded document with count being an IntField.

See https://mongoengine-odm.readthedocs.io/guide/defining-documents.html#embedded-documents for more info about embedded docs.

KeironO commented 7 years ago

@wojcikstefan

Following that documentation, I've written the following.

class NegativeAdduct(db.EmbeddedDocument):
    count = db.IntField()
    peaks = db.ListField(db.ListField(db.DynamicField()))

class PositiveAdduct(db.EmbeddedDocument):
    count = db.IntField()
    peaks = db.ListField(db.ListField(db.DynamicField()))

class AdductWeights(db.EmbeddedDocument):
    neutral = db.FloatField()
    negative = db.EmbeddedDocumentField(NegativeAdduct)
    positive = db.EmbeddedDocumentField(PositiveAdduct)

class MetaboliteAdduct(db.DynamicDocument):
    meta = {"collection": "metabolites"}
    name = db.StringField()
    origins = db.ListField(db.StringField())
    adduct_weights = db.EmbeddedDocumentField(AdductWeights)

class MetaboliteAdductResource(Resource):
    document = MetaboliteAdduct
    filters = {
        "name" : [ops.Contains, ops.Startswith, ops.Exact],
        "adduct_weights" : [ops.Contains]
    }

@api.register(name="adductsapi", url="/adducts/")
class MetaboliteAdductView(ResourceView):
    resource =  MetaboliteAdductResource
    methods = [methods.List, methods.Fetch]

However, this returns a blank data array when called (/adducts/).

wojcikstefan commented 7 years ago

Change db.DynamicDocument to db.Document. Does the problem persist? Do you have data in that collection?

KeironO commented 7 years ago

Thank you @wojcikstefan for your assistance, but I'm still having an issue.

Now it's throwing up an error.

mongoengine.errors.FieldDoesNotExist
FieldDoesNotExist: The fields "set(['accurate_mass', 'smiles', 'isotopic_distributions', 'molecular_formula'])" do not exist on the document "MetaboliteAdduct"

Adding the field(s) stops the error occurring, but the original issue still persists - of nothing being returned in the data array.

class MetaboliteAdduct(db.Document):
    meta = {"collection": "metabolites"}
    name = db.StringField()
    accurate_mass = db.FloatField()
    smiles = db.StringField()
    isotopic_distributions = db.StringField()
    molecular_formula = db.StringField()
    origins = db.ListField(db.StringField())
    adduct_weights = db.EmbeddedDocumentField(AdductWeights)

To answer your question, data is present in the collection.

KeironO commented 7 years ago

Closing issue, too long and I feel its gotten past the original question.