ivansg44 commented 5 years ago

Added the following API calls:

api/resources/{pk}/delete/specifications/
- Deletes all terms from specifications of package with id {pk}
- Implemented in delete_specifications
api/resources/{pk}/delete/specifications/{term_id}
- Deletes term with id term_id from specifications of package with id {pk}
- Implemented in delete_specifications
api/resources/{pk}/create/specifications/{term}
- Adds {term} to specifications of package with id {pk}
- Term must be a JSON object
- Term must have an id attribute
- id attribute must be a full IRI
- Implemented in create_specifications

I implemented two private helper functions:

_get_modifiable_packages
- Constructs QuerySet of packages user has permission to modify
- Called in delete_specifications and create_specifications, to prevent user from deleting or adding terms in packages they do not own
_translate_iri
- Shortens an IRI through a substitution prefix
  - First looks for a substitution prefix in a specified package's @context
  - If none is present, attempts to create one and add it to the specified package's @context
- Called in create_specifications, where it shorten's the user-specified id in the term being added

I also renamed some variables and edited some comments for clarity.

The entire specifications of any package are not loaded into the user's browser through any of the changes I have implemented. Specific, targeted queries are executed on the backend at the following lines in views.py:

The most data loaded into the user's browser is at line 274, when the data inside the @context field of a package is loaded to check for substitution prefixes.

ivansg44 commented 5 years ago

@ddooley

ddooley commented 5 years ago

The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code? If later we want to open it up to API, then we offer up a version that is shielded with the extra pk check in self._get_modifiable_packages(request). The translate_iri() may be getting called a bit more soon, not just for the term_id, but also any components that have term_ids.

Also, add a pk filter option up front to _get_modifiable_packages() ? Or is it that the "return Package.objects.filter(Q(owner=user))" line is a delayed query call? Otherwise if it isn't then we're loading megabytes of object data from the database where users own one or more ontologies + packages. Not efficient.

Another move would be to make a separate self._get_modifiable_package_ids(request) which doesn't return all the package data, just the list of ids user does manage. So it is a call used only for security check; fetch of package contents is separate.

Also about the "cursor.execute("update geem_package set contents= ... )" statement - its nice that postgresql/django lets one surgically insert the new key/value!

ivansg44 commented 5 years ago

The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code?

Yep, it's only server side call! I will get rid of the security check.

Also, add a pk filter option up front to _get_modifiable_packages() ? Or is it that the "return Package.objects.filter(Q(owner=user))" line is a delayed query call? Otherwise if it isn't then we're loading megabytes of object data from the database where users own one or more ontologies + packages. Not efficient.

QuerySet.filter does indeed return a new QuerySet (i.e., a delayed query call), so I don't think adding a pk filter to _get_modifiable_packages, and implementing _get_modifiable_package_ids is necessary. What do you think?

ivansg44 commented 5 years ago

The _translate_iri(self, request, term_id, pk) is a server side only call, right? I think we can just allow it to be called without additional security check on pk since that is already done in calling code? If later we want to open it up to API, then we offer up a version that is shielded with the extra pk check in self._get_modifiable_packages(request). The translate_iri() may be getting called a bit more soon, not just for the term_id, but also any components that have term_ids.

I just realised that it's not so much a security call, as it is me getting the package with the @context I'm interested in. I could replace the pk parameter in _translate_iri with a QuerySet parameter, and pass the already queried package from create_specifications instead, so _translate_iri does not have to query it again. However, I will still need the pk for the raw SQL query in _translate_iri, so maybe I could pass both the already-queried package and pk as parameters? Either way, I avoid the redundant query as you requested.

ddooley commented 5 years ago

Can _translate_iri just use a query against package pk x context.[prefix] rather than loading the whole dictionary and passing it? Thinking postgresql would be efficient that way. But no bigie if just passing @context dictionary - in fact that might be preferable if calling routinge repeatedly with questions...

ivansg44 commented 5 years ago

Oh, that's a good idea.

ivansg44 commented 5 years ago

@ddooley See latest commit. I made _translate_iri more robust and efficient.

ddooley commented 5 years ago

In both get_specifications() and delete_specifications() this code:

Query specified package

    queryset = self._get_modifiable_packages(request)
    queryset = queryset.filter(pk=pk)

... if queryset.count() == 0:

If that conditional is run that actually triggers the query? (Or is it so sophisticated that it can apply the filterr, get a count, but not set up the returned field data?) So I really think a separate _get_modifiable_package_ids() function should be done to be efficient. It will be a query that returns just ids, rather than the "return Package.objects.filter(Q(owner=user))" thingy.

ivansg44 commented 5 years ago

You're right that it triggers a query, but it seems to be a SELECT COUNT(*) query on the server side. I believe the only thing that gets loaded to front-end memory is an int count of the packages in queryset, as opposed to the actual package objects. I inferred this from the documentation here, which seems to advocate QuerySet.count as being very efficient. Thoughts?

ddooley commented 5 years ago

Ok Great, i see count doesn't trigger fetch columns specified in original query. I found this discussion https://wiki.postgresql.org/wiki/Slow_Counting which confirms that with a WHERE clause, like user=xyz this is quick, and only with no condition they say it is slower, but still doesn't return any data.

cidgoh / geem

Add REST API CRUD (Create UPDATE Delete) actions to new specification API call #33

Query specified package