Closed brentp closed 6 years ago
the same is true of "pfam", the the "GNFMm" fields, "HTA-2_0", "refseq/genomic", "retired", "ipi", "CC", and more. My feature request would be to normalize these to always be lists even if there is only a single element. This would normalize access, even in dynamic lanugages, but I understand if that's not how you want the results.
Hi @brentp, for each gene object, we chose to keep the best representation for that gene, that means if a fielded value is a single item, it will be a single item, instead of a list of single item. Especially for a field which has only very few exceptions of multiple-item values, it's not an efficient way to force all genes to keep the same array data type.
However, I do understand the concern you raised for the downstream data consumer, because it requires users to check the data type of a field. The solution for this, I think, is to do the on-the-fly conversion at the client-side, instead of storing arrays on our server-side.
If you use our mygene Python client, we provide a helper function called alwayslist
, which converts any input value to a list:
for value in alwayslist(gene_obj['pfam']):
print(value)
I'm not familiar how a typed language like Java deals with a JSON object. Will the similar client-side solution apply?
I was writing a library for go. If the output is consistent, I can have a deserializer automatically generated from the JSON. Without it, everything must be an interface{}
(untyped value) so I loose all type safety and discover-ability.
This makes it harder to parse the json in statically typed languages.
e.g. see this response from the query
http://mygene.info/v3/query?q=ATM&fields=all