biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

go.BP.pubmed is sometimes a list and sometimes a number #7

Closed brentp closed 6 years ago

brentp commented 7 years ago

This makes it harder to parse the json in statically typed languages.

e.g. see this response from the query http://mygene.info/v3/query?q=ATM&fields=all

      "go": {
        "BP": [
          {
            "evidence": "IDA",
            "id": "GO:0006468",
            "pubmed": 15916964,
            "term": "protein phosphorylation"
          },
          {
            "evidence": "IMP",
            "id": "GO:0006974",
            "pubmed": [
              15790808,
              17875758,
              24550317
            ],
            "term": "cellular response to DNA damage stimulus"
          },
brentp commented 7 years ago

the same is true of "pfam", the the "GNFMm" fields, "HTA-2_0", "refseq/genomic", "retired", "ipi", "CC", and more. My feature request would be to normalize these to always be lists even if there is only a single element. This would normalize access, even in dynamic lanugages, but I understand if that's not how you want the results.

newgene commented 7 years ago

Hi @brentp, for each gene object, we chose to keep the best representation for that gene, that means if a fielded value is a single item, it will be a single item, instead of a list of single item. Especially for a field which has only very few exceptions of multiple-item values, it's not an efficient way to force all genes to keep the same array data type.

However, I do understand the concern you raised for the downstream data consumer, because it requires users to check the data type of a field. The solution for this, I think, is to do the on-the-fly conversion at the client-side, instead of storing arrays on our server-side.

If you use our mygene Python client, we provide a helper function called alwayslist, which converts any input value to a list:

for value in alwayslist(gene_obj['pfam']):
    print(value)

I'm not familiar how a typed language like Java deals with a JSON object. Will the similar client-side solution apply?

brentp commented 7 years ago

I was writing a library for go. If the output is consistent, I can have a deserializer automatically generated from the JSON. Without it, everything must be an interface{} (untyped value) so I loose all type safety and discover-ability.