Closed ivansg44 closed 5 years ago
Looks like great progress. I see the resource "content" field is a JSONField, specifically a jsonb field, which means fast access via postgress:
line 73, https://github.com/GenEpiO/geem/blob/master/geem/models.py
contents = JSONField() # Note, this takes a while to save because postgres creates queryable structure of contents?
How do we make sure it is "GiST" indexed as described at top of: https://docs.djangoproject.com/en/2.1/ref/contrib/postgres/fields/#django.contrib.postgres.fields.JSONField
"Index and Field.db_index both create a B-tree index, which isn’t particularly helpful when querying complex data types. Indexes such as GinIndex and GistIndex are better suited, though the index choice is dependent on the queries that you’re using. Generally, GiST may be a good choice for the range fields and HStoreField, and GIN may be helpful for ArrayField and JSONField."
As for the id={id} parameter, that's ok, but ideally we would put in api/urls.py a path to this instead. But it depends on all the operations - get / push / delete that will be piled on soon. What would they look like as standard REST?. There would probably be the need in the future for multiple item return, i.e. ids=x,y,z . We can discuss tomorrow.
@ddooley
See latest commit.
Nice; in future id as parameter might support multiple identifiers so it makes sense to have that as a 2nd way.
@ddooley
I have investigated the possibility of fetching items from packages through the API, without loading the entire
specifications
field to JSON.api/resources/{pk}/specifications/?format=json
produces a JSON object containing the entirespecifications
field of a package with id{pk}
.api/resources/{pk}/specifications/?format=json&id={id}
produces a JSON object containing a single term with with id{id}
from thespecifications
. This does not load the entirespecifications
to memory before filtering it down to one item. Instead, it constructs aQuerySet
for that specific item in thespecifications
JSON object.QuerySet
objects do not touch the database until they are actually evaluated. In the context of this pull request, theQuerySet
object is evaluated at line 108 of geem/views.py, which is after I have specified I only want one term extracted.There is no "contained" way to add id to the URL in the shape of
api/resources/{pk}/specifications/{id}
. We would have to hard-code it into api/urls.py, as seen in the Django documentation here. Let me know if having id as a query parameter is acceptable, or you want it hard-coded into a URL.