Closed ivansg44 closed 5 years ago
@ddooley
The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code? If later we want to open it up to API, then we offer up a version that is shielded with the extra pk check in self._get_modifiable_packages(request). The translate_iri() may be getting called a bit more soon, not just for the term_id, but also any components that have term_ids.
Also, add a pk filter option up front to _get_modifiable_packages() ? Or is it that the "return Package.objects.filter(Q(owner=user))" line is a delayed query call? Otherwise if it isn't then we're loading megabytes of object data from the database where users own one or more ontologies + packages. Not efficient.
Another move would be to make a separate self._get_modifiable_package_ids(request) which doesn't return all the package data, just the list of ids user does manage. So it is a call used only for security check; fetch of package contents is separate.
Also about the "cursor.execute("update geem_package set contents= ... )" statement - its nice that postgresql/django lets one surgically insert the new key/value!
The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code?
Yep, it's only server side call! I will get rid of the security check.
Also, add a pk filter option up front to _get_modifiable_packages() ? Or is it that the "return Package.objects.filter(Q(owner=user))" line is a delayed query call? Otherwise if it isn't then we're loading megabytes of object data from the database where users own one or more ontologies + packages. Not efficient.
QuerySet.filter
does indeed return a new QuerySet
(i.e., a delayed query call), so I don't think adding a pk filter to _get_modifiable_packages
, and implementing _get_modifiable_package_ids
is necessary. What do you think?
The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code? If later we want to open it up to API, then we offer up a version that is shielded with the extra pk check in self._get_modifiable_packages(request). The translate_iri() may be getting called a bit more soon, not just for the term_id, but also any components that have term_ids.
I just realised that it's not so much a security call, as it is me getting the package with the @context
I'm interested in. I could replace the pk
parameter in _translate_iri with a QuerySet
parameter, and pass the already queried package from create_specifications
instead, so _translate_iri
does not have to query it again. However, I will still need the pk
for the raw SQL query in _translate_iri
, so maybe I could pass both the already-queried package and pk
as parameters? Either way, I avoid the redundant query as you requested.
Can _translate_iri just use a query against package pk x context.[prefix] rather than loading the whole dictionary and passing it? Thinking postgresql would be efficient that way. But no bigie if just passing @context dictionary - in fact that might be preferable if calling routinge repeatedly with questions...
Oh, that's a good idea.
@ddooley See latest commit. I made _translate_iri more robust and efficient.
In both get_specifications() and delete_specifications() this code:
queryset = self._get_modifiable_packages(request)
queryset = queryset.filter(pk=pk)
... if queryset.count() == 0:
If that conditional is run that actually triggers the query? (Or is it so sophisticated that it can apply the filterr, get a count, but not set up the returned field data?) So I really think a separate _get_modifiable_package_ids() function should be done to be efficient. It will be a query that returns just ids, rather than the "return Package.objects.filter(Q(owner=user))" thingy.
You're right that it triggers a query, but it seems to be a SELECT COUNT(*)
query on the server side. I believe the only thing that gets loaded to front-end memory is an int
count of the packages in queryset
, as opposed to the actual package objects. I inferred this from the documentation here, which seems to advocate QuerySet.count
as being very efficient. Thoughts?
Ok Great, i see count doesn't trigger fetch columns specified in original query. I found this discussion https://wiki.postgresql.org/wiki/Slow_Counting which confirms that with a WHERE clause, like user=xyz this is quick, and only with no condition they say it is slower, but still doesn't return any data.
Added the following API calls:
api/resources/{pk}/delete/specifications/
Deletes all terms from specifications of package with id
{pk}
Implemented in
delete_specifications
api/resources/{pk}/delete/specifications/{term_id}
Deletes term with id
term_id
from specifications of package with id{pk}
Implemented in
delete_specifications
api/resources/{pk}/create/specifications/{term}
Adds
{term}
to specifications of package with id{pk}
Term must be a
JSON
objectTerm must have an
id
attributeid
attribute must be a full IRIImplemented in
create_specifications
I implemented two private helper functions:
_get_modifiable_packages
Constructs
QuerySet
of packages user has permission to modifyCalled in
delete_specifications
andcreate_specifications
, to prevent user from deleting or adding terms in packages they do not own_translate_iri
Shortens an IRI through a substitution prefix
First looks for a substitution prefix in a specified package's
@context
If none is present, attempts to create one and add it to the specified package's
@context
Called in
create_specifications
, where it shorten's the user-specifiedid
in the term being addedI also renamed some variables and edited some comments for clarity.
The entire specifications of any package are not loaded into the user's browser through any of the changes I have implemented. Specific, targeted queries are executed on the backend at the following lines in
views.py
:119
158
167
173
178
204
231
242
274
309
The most data loaded into the user's browser is at line
274
, when the data inside the@context
field of a package is loaded to check for substitution prefixes.