chembl / chembl_webservices_2

Source code of the ChEMBL web services.
https://www.ebi.ac.uk/chembl/ws
Other
16 stars 3 forks source link

access of molecule_form is not fileterd properly by parent identifier #113

Closed piotr-gawron closed 7 years ago

piotr-gawron commented 7 years ago

I'm using queries like: https://www.ebi.ac.uk/chembl/api/data/molecule_form?parent=CHEMBL660 to obtain information about hierarchical structure of chembl compunds. It was working for some time, but in the past few days/weeks it has changed. Filtering based on the parent field doesn't work anymore. When I try to access it I get some random compounds with no relation to the compound in the query.

Whereas some time ago it looked like:

<response>
  <molecule_forms>
    <molecule_form>
      <molecule_chembl_id>CHEMBL1445834</molecule_chembl_id>
      <parent>False</parent>
    </molecule_form>
    <molecule_form>
      <molecule_chembl_id>CHEMBL1569</molecule_chembl_id>
      <parent>False</parent>
    </molecule_form>
    <molecule_form>
      <molecule_chembl_id>CHEMBL465617</molecule_chembl_id>
      <parent>False</parent>
    </molecule_form>
    <molecule_form>
      <molecule_chembl_id>CHEMBL660</molecule_chembl_id>
      <parent>True</parent>
    </molecule_form>
  </molecule_forms>
  <page_meta>
    <limit>20</limit>
    <next/><offset/>
    <previous/>
    <total_count>4</total_count>
  </page_meta>
</response>
mnowotka commented 7 years ago

Interesting! Let me have a look...

mnowotka commented 7 years ago

OK, so I changed this endpoint as a part of release. Now, for a given compound, if you want to explore it hierarchy you should call: https://www.ebi.ac.uk/chembl/api/data/molecule_form/CHEMBLID.json, so in your example: https://www.ebi.ac.uk/chembl/api/data/molecule_form/CHEMBL660.json

This will return this document:

{

"molecule_forms": [
    {
        "is_parent": "False",
        "molecule_chembl_id": "CHEMBL465617",
        "parent_chembl_id": "CHEMBL660"
    },
    {
        "is_parent": "True",
        "molecule_chembl_id": "CHEMBL660",
        "parent_chembl_id": "CHEMBL660"
    },
    {
        "is_parent": "False",
        "molecule_chembl_id": "CHEMBL1445834",
        "parent_chembl_id": "CHEMBL660"
    },
    {
        "is_parent": "False",
        "molecule_chembl_id": "CHEMBL1569",
        "parent_chembl_id": "CHEMBL660"
    }
],
"page_meta": {
    "limit": 20,
    "next": null,
    "offset": 0,
    "previous": null,
    "total_count": 4
}

}

Does it make sense?

piotr-gawron commented 7 years ago

Sure, I will update my code. But maybe you should consider disabling parent parameter? Error message is better than random output ;-)

mnowotka commented 7 years ago

By design we silently ignore parameters that doesn't exists within the resource. So https://www.ebi.ac.uk/chembl/api/data/molecule_form.json?foo=bla is equivalent to https://www.ebi.ac.uk/chembl/api/data/molecule_form.json? but https://www.ebi.ac.uk/chembl/api/data/molecule_form.json?parent_chembl_id=CHEMBL660 will return a message informing that you are not allowed to apply filters on this field. This is because of the nature of REST API, it's often used in browsers and it may happen that some invalid parameters were just copied when coming from some other URL. Since 'parent' does not exist (there is 'is_parent' flag) it was ignored.

mnowotka commented 7 years ago

BTW: the output is not random at all. Since https://www.ebi.ac.uk/chembl/api/data/molecule_form?parent=CHEMBL660 is equivalent with https://www.ebi.ac.uk/chembl/api/data/molecule_form, it simply shows all the parent-child relations stored in ChEMBL (actually the first 20 of them because of the pagination).

piotr-gawron commented 7 years ago

Well, This parameter was valid in the previous version, so after removing it from API, old queries should inform user that it is not valid anymore. That's at least my impression.

From time to time you return error messages: https://www.ebi.ac.uk/chembl/api/data/molecule_form?parent_chembl_id=CHEMBL660

Btw. you still have "parent" parameter in your documentation for this method: https://www.ebi.ac.uk/chembl/api/data/molecule_form/schema

Anyway, thanks for help. I really appreciate it

mnowotka commented 7 years ago

Yes, I can certainly correct the documentation.

mnowotka commented 7 years ago

OK, so there was a bug and your issue helped me with identifying it. The bug is here: https://github.com/chembl/chembl_webservices_2/blob/master/chembl_webservices/resources/molecule_forms.py#L42 where 'filtering' part still mentions old field name. I released the fix which makes: