Closed buchanae closed 5 years ago
I will point you to this previous issue: https://github.com/biothings/mygene.info/issues/7
I see. Well, I think you should reconsider the amount of work you're putting on to your clients, especially those of us with statically typed languages. It's also very easy to miss, and therefore leads to subtly broken code (and therefore subtly broken analysis results). Within one week of using this service, I've already encountered this with a coworker.
But, I respect your decision, and still appreciate the great work you guys are doing!
@buchanae I understand that the current data structure will cause more troubles on a statically typed language (Java) than a dynamically typed one (Python). We will think of something we can do at API-level (not the underlying data structure level) to help address this issue.
One thing in my mind is to add a query parameter like alwayslist=field1,field2
to enforce any field to be a list, basically, bring that alwayslist
helper function into the API. Similarly, we can also add an allownull=field1,field2
to make a non-existing field as a null value instead.
Let us know how you think. We are happy to hear the feedback from the statically-typed language side.
That's a clever way to apply a fix without breaking the API, nice!
Although, personally I'd just want a single flag (or separate endpoint) that coerces the whole data structure into consistent types.
Yes, that will be ideal. And this requires us to maintain a complete data schema (it's on our roadmap anyway), more of a long-term solution. We will have some internal discussions on this and figure out how we should prioritize this.
@newgene Awesome, thanks for taking this into consideration.
This was a very annoying thing for me to deal with as well. I implemented a solution using cerberus to validate the json doc and perform coercion on the fields I was interested in using before accessing the data. Example:
from cerberus import Validator
def alwayslist(value):
"""If input value if not a list/tuple type, return it as a single value list."""
if value is None:
return []
if isinstance(value, (list, tuple)):
return value
else:
return [value]
mygene_schema = {
'entrezgene': {'type': 'integer', 'coerce': int, 'required': True},
'taxid': {'type': 'integer', 'required': True, 'coerce': int},
'SGD': {'type': 'string', 'required': False},
'HGNC': {'type': 'string', 'required': False},
'MIM': {'type': 'string', 'required': False},
'MGI': {'type': 'string', 'required': False},
'locus_tag': {'type': 'string', 'required': False},
'RGD': {'type': 'string', 'required': False},
'FLYBASE': {'type': 'string', 'required': False},
'WormBase': {'type': 'string', 'required': False},
'ZFIN': {'type': 'string', 'required': False},
'ensembl': {
'type': 'list', 'coerce': alwayslist, 'required': False,
'schema': {
'type': 'dict',
'allow_unknown': True,
'required': False,
'schema': {'gene': {'type': 'string'},
'protein': {'type': 'list', 'coerce': alwayslist, 'schema': {'type': 'string'}},
'transcript': {'type': 'list', 'coerce': alwayslist, 'schema': {'type': 'string'}},
},
}
}
}
v = Validator(mygene_schema, allow_unknown=True)
v.validate(d)
if v.errors:
raise ValueError(v.errors)
return v.document
Hi @buchanae , the feature that @newgene described above is implemented on mygene.info.
the always_list parameter will make a field always be a list (if it exists), the allow_null parameter will insert a field into the document as null if it doesn't exist.
Here's an example:
http://mygene.info/v3/query?q=cdk?&always_list=_id&fields=_id&allow_null=test.test_field
It's still very much a work-in-progress, so I'll leave the issue open until you get a chance to test it and leave feedback.
This is the python code I used to try this feature:
requests.get("http://mygene.info/v3/query", params={
"q": "symbol:" + gene,
"fields": "ensembl.gene,symbol,genomic_pos",
"species": "human",
"always_list": "ensembl, genomic_pos",
})
Seems to work. Thank you for doing this.
I would prefer to have one parameter that set always_list
and allow_null
for all fields, but I can also build a client-side helper to do that. Thoughts?
@buchanae Great! Glad this works for you. And we are indeed planning on a more "formal" solution for this. I created a new issue: #52 to track the progress and will close this issue.
Looks like the types of some fields can change between string, list, or non-existent. For example, "ensembl" and "alias".
This gets tedious to work with on the client side. I'll need to add checks for these fields now, which will inevitably end up as a helper library I use to wrap all calls. It'd be a lot easier if the API adhered to a strict schema.