biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Deterministic JSON ordering #6

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

Is it possible to make it so the same query always returns the same JSON text?

Currently, sometimes the max_score, total, and took fields appear at the beginning of the JSON and sometimes they appear at the end.

If the implementation is in python, OrderedDicts would probably fix this issue.

newgene commented 7 years ago

@dhimmel I remember we solved this issue before, exactly using OrderedDicts. We will double-check why it is not in effect right now.

newgene commented 7 years ago

@dhimmel OK, we checked into this issue. It turns out that what we implemented before is to ensure the alphabetic order of all attributes for each gene object. So you will see their orders are consistent in these queries:

https://mygene.info/v3/gene/1017

https://mygene.info/v3/query?q=a1bg (look at each gene hit under "hits" fields)

For the JSON output from our query endpoint, the field names at the root level (like max_score, total, took, hits) will not be guaranteed in consistent order. We think this is not a problem for our users, so we did not enforce the order, but if that causes any issue for you, please let us know.

dhimmel commented 7 years ago

We think this is not a problem for our users, so we did not enforce the order, but if that causes any issue for you, please let us know.

It's rather undesirable for the following reasons:

newgene commented 7 years ago

I agree that, in the case of people want to view JSON output visually (like in a browser), the shifting of the the field order can be a bit confusing. But in the actual code which process the JSON output, it really does not matter.

For "testing" and "diffs" purpose you mentioned, I would always recommend to do them in the native data objects (like Python dictionary) after the JSON parsing, rather than in the form of JSON raw text. It won't be reliable any way (e.g. sometime due to different indentation)

Having said that, I think we can add additional logic to make the order consistent. There is no downside of that, and it should be pretty trivial to add.

dhimmel commented 7 years ago

Having said that, I think we can add additional logic to make the order consistent. There is no downside of that, and it should be pretty trivial to add.

Awesome. I agree with all your points, but still think situations will arise where this is helpful.

newgene commented 7 years ago

@dhimmel a recent update to BioThings API SDK (the common backend for both MyGene.info and MyVariant.info) has added the feature (thanks to @cyrus0824) to keep the persist key order of the returned JSON object from the query endpoint (/v3/query). As I described above, the "hits" key is now always the last one, and the rest of keys are sorted alphabetically, e.g.,

https://mygene.info/v3/query?q=a1bg