Open guzman-raphael opened 3 years ago
bare .fetch
on dictionaries gives list-of-dictionaries where each dictionary contains the requested attributes (vs as-dict fetch without attribute arguments, which returns list-of-dictionaries containing all fields) - this is in contrast to numpy fetching which returns multiple N-length arrays containing A1, A2, etc when given an attribute list, and a combined array with fields when fetched with no attribute list
that said, dictionary contents are in fact table-ordered (and not fetch-argument-ordered):
>>> djmon.Event().fetch('ev_datetime', 'ev_type_id', as_dict=True)
[{'ev_datetime': datetime.datetime(2020, 12, 23, 4, 39), 'ev_type_id': 1}, {'ev_datetime': datetime.datetime(2020, 12, 22, 22, 15, 17), 'ev_type_id': 2}, {'ev_datetime': datetime.datetime(2020, 12, 22, 0, 0), 'ev_type_id': 4}]
>>> djmon.Event().fetch('ev_type_id', 'ev_datetime', as_dict=True)
[{'ev_datetime': datetime.datetime(2020, 12, 23, 4, 39), 'ev_type_id': 1}, {'ev_datetime': datetime.datetime(2020, 12, 22, 22, 15, 17), 'ev_type_id': 2}, {'ev_datetime': datetime.datetime(2020, 12, 22, 0, 0), 'ev_type_id': 4}]
our fetch semantics have some 'inner logic' which I think we can explain better (good example is fetch1 which returns dictionary vs fetch which returns ndarray with fields, as fetch1 is often used when doing 'database searching' so dictionary keys are desired, whereas bare fetch is used more in computation, so numpy is desired) - not sure how/if this should/should-not factor into resolving/not resolving this issue ..
Key ordering was not even a python feature until Python 3.6, so at the time this was designed and specified, this would not have come up.
I would not consider this a bug. It was not our spec to preserve the order. Perhaps an enhancement.
@dimitri-yatsenko Good note on Python feature release schedule. Wasn't sure myself when I opened this if it should be a bug or not. With that in mind, I can go ahead and convert this into a new feature request.
Side Note on Key Ordering: OrderedDict
has been around since Python 3.1 before it became expected of normal dict
in Python 3.6. Not sure exactly when we started introducing it into our codebase and if it has been exposed to users.
Yes, we used OrderedDict
in many places in the code, but we took as_dict
to mean to return normal python dicts which were not ordered. The nominal use of dicts in python was and mostly remains as a hash table for looking values by keys. I would mark this as an enhancement since no new functionality is added.
@ixcat Thanks for clearing this up and providing your example. It might be what this is getting at is that the fetch result ordering is not as intuitive as I initially thought.
Perhaps this is simply just a documentation point. We could indicate that fetch results respect table/query attribute order and if it is necessary to respect an attribute order, then user should take note to design the query properly. @ixcat @dimitri-yatsenko What do you think?
agree it would be good to review the fetch/fetch1 documentation since this issue as presented is somewhat 2.5-fold:
BTW, here is the 'confusion' that I'm specifically referring to when it comes to order around fetch
/fetch1
.
Using the Plane
table from my original example:
import datajoint as dj
schema = dj.Schema('test3')
@schema
class Plane(dj.Lookup):
definition = """
# Defines manufacturable plane model types
plane_type : varchar(25) # Name of plane model
---
plane_rows : int # Number of rows in plane model i.e. range(1, plane_rows + 1)
plane_columns : int # Number of columns in plane model; to extract letter we will need these indices
"""
contents = [('B_Airbus', 37, 4), ('F_Airbus', 40, 5)]
# Fetch
print(Plane().fetch()) # returns list of tuples which represent each row; all attributes elements ordered based in order in query
print(Plane().fetch(as_dict=True)) # returns as list of dict which represent each row; all attributes key ordered based in order in query (attribute order not guaranteed before Python 3.6)
print(Plane().fetch('plane_rows', 'plane_type')) # returns list of numpy arrays for each attribute; attribute order defined by argument order in fetch
print(Plane().fetch('plane_rows', 'plane_type', as_dict=True)) # returns as list of dict which represent each row; specific attributes key ordered based in order in query (attribute order not guaranteed before Python 3.6)
# Fetch1
q = Plane() & dict(plane_type='B_Airbus')
print(q.fetch1()) # returns as dict which represent the row; all attributes key ordered based in order in query (attribute order not guaranteed before Python 3.6)
print(q.fetch1('plane_rows', 'plane_type')) # returns tuple which represent the row; specific attributes elements ordered based in argument order in fetch
Essentially, I'd say we either need better documentation on how arg/kwargs impact results or we should address this to have a bit more consistency (though this is a delicate part of DataJoint and backward-compatibility would need to be ensured since directly user facing).
from discussion - current issue only refers to possible reworking of attribute order within retrieved dictionary elements; documentation portion of the issue covered separately (issue: https://github.com/datajoint/datajoint-docs/issues/101 should cover it)
Bug Report
Description
Unsure exactly if this is a bug or if we are not guaranteeing a key order when
as_dict=True
.What was observed is that in some cases, when fetching with
fetch(..., as_dict=True)[0].values()
the attribute order is inconsistent compared tofetch1(...)
. Perhaps is it related to the join? Intent behind doing this is that sometimes it is desired to have data arranged together per row.Reproducibility
Include:
Steps:
Error Stack:
Expected Behavior
Order to be consistent over varying output formats.
Additional Research and Context
This was noticed when preparing an answer to this StackOverflow question.