Closed turtle321 closed 6 years ago
I am about to write tests for this change, but I am wondering how to run the tests: I defined all the required OS vars (CLIENT_KEY, CLIENT_SECRET, USER_PASSWORD, USER_EMAIL, REDIRECT_URL, USER_ORCID) but tests like test_read_record_member_xml
fail because there is no work for the user I have just created. Any suggestion? @MSusik
Ok I wrote some basic tests
In case of statuses other than 200, you can catch RequestException
as shown in the README and retrieve the request or response.
In case of a 200 code, what kind of information would be required?
Yes, in case of exception the client code has access to the requests
' exception and thus the actual response and request.
I would like to have that access also in case of success.
My current use case is to know the HTTP status code in the response. You suggest to assume that it is a 200 when no exception is raised. But that's not enough: request.raise_for_status()
raises for 4xx and 5xx status codes. There is a list of 2xx and 3xx status codes that might result in a successful response, not just 200.
Other use cases I might be interested when debugging is having access to the response headers (for caching, authorization, access control, last modified, ...) or even the request headers/content.
And finally, the code change is just about replacing response
to self.response
, so having the response part of the public interface of that object. It's a small change that enables the features I mentioned.
OK, I found a bit of time to review. First of all, thanks for the PR, I agree that such functionality is useful.
However, I see some things to improve. First of all, the _authenticate
method performs two requests to ORCID. With this PR, the developer is unable to get the first response. Thus, the feature is not implemented for the whole library.
Moreover, I'm not sure if by default this feature should be turned on. If the response from the server is heavy, the software that uses this wrapper will not be able to release the memory unless another request is performed or the wrapper is released. Usually, API wrappers do not implement this feature at all! (see https://github.com/mobolic/facebook-sdk/blob/5d6e78fe26e2e10ebbf06628c20a9f9eb3b50062/facebook/__init__.py#L259 and https://github.com/bear/python-twitter/blob/b2af803a8354cadcc646fe6507fb744d828ff693/twitter/api.py#L309)
It turns out it is not that easy to make this optional seamlessly. I see two solutions that I demonstrate on a toy class
1).
class WithSavedResponse:
def __init__(self, save_last_response=False):
self.save_last_response = save_last_response
self._last_response = None
def set_last_response(self, value):
if self.save_last_response:
self._last_response = value
def get_last_response(self):
if self.save_last_response:
return self._last_response
else:
raise AttributeError("The save last response option was not chosen. Last response not available")
last_response = property(get_last_response, set_last_response)
def function(self):
response = 'content returned by api'
self.last_response = response
return response[1]
w = WithSavedResponse()
try:
w.last_response
except AttributeError:
print("caught")
w = WithSavedResponse(save_last_response=True)
print(w.last_response is None)
res = w.function()
print(w.last_response == 'content returned by api')
print(res == 'o')
2).
import inspect
def remove_last_response(fn):
def wrapper_func(*args, **kwargs):
retval = func(*args, **kwargs)
del self.last_response
return retval
return wrapper_func
class WithSavedResponse:
def __init__(self, save_last_response=False):
if not save_last_response:
for attr in inspect.getmembers(self, inspect.isroutine):
function_name = attr[0]
if (
function_name[:2] != '__'
and function_name[:-2] != '__'
and callable(getattr(self, function_name))
):
# No builtins allowed here
setattr(self, attr[0], remove_last_response(getattr(self, function_name)))
else:
self.last_response = None
def function(self):
self.last_response = 'content returned by api'
return self.last_response[1]
w = WithSavedResponse()
try:
w.last_response
except AttributeError:
print("caught")
w = WithSavedResponse(save_last_response=True)
print(w.last_response is None)
res = w.function()
print(w.last_response == 'content returned by api')
print(res == 'o')
One of them requires additional line for every response, the other requires using inspect
library that is not considered nice. I would say that option 1). is better, but there might be something smarter that would work better.
I see your point about the design of popular API that do not offer this feature. Anyway, in case of a library that wraps another library, I prefer the design that gives access to the original library. This gives more flexibility. Or, to return not just the response content, but also the status code and the headers. That's the case of Elasticsearch pyhton client: https://github.com/elastic/elasticsearch-py/blob/a03504cbb2af34490e31c7b912165028a1535381/elasticsearch/connection/http_requests.py#L94 This option though whould change the signature of our responses, thus I prefer the first one I mentioned.
About the memory usage: the only case when a response is heavy, is when response.content
is large.
Let's look at a typical use case:
api = orcid.PublicAPI(institution_key, institution_secret, sandbox=True)
...
summary1 = api.read_record_public('0000-0001-1111-1111', 'activities', mytoken)
...
summary2 = api.read_record_public('0000-0001-1111-2222', 'record', mytoken)
In this case the value returned by the call (summary1
) is large (and under the hood pointing to the same memory space as PublicAPI::self.response
). The garbage collector will take care of freeing up memory when api
is not used anymore, the same for summary1
. If instead api
is re-used then PublicAPI::self.response
is overridden and the memory freed up.
So the memory footprint is not significatly affected by this code change (given also the type of data we are dealing with: text).
I'd love to have this feature optional, but given that this would mean to complicate the code a lot, I'd prefer to have this as default behavior.
And about the _authenticate()
method, this method performs 2 operations: asking for the authorization code and logging in using the end-user's credentials. I suggest to either remove self.response
in this method entirely or to store only the authorization code call's response.
(the issue arises from the fact that the end-user login should not happen here: the end-user should be redirected to ORCID's website where he should login and be redirected back to our website with an authorization code that later on we exchange for an access token - using get_token_from_authorization_code
. We are an untrusted application and we should not know the end-user's ORCID password. For trusted application instead, OAuth supports the password flow
- where the third-party knows the end'user's password - but this is not available in ORCID's API because they have no trusted apps)
I see your point about the design of popular API that do not offer this feature.
Still I believe it should be available. I really like the solution of elasticsearch
wrapper. Note that the response is not saved - the relevant fields from request
are returned to the user. Unfortunately we can't use this solution due to backcompability.
So the memory footprint is not significatly affected by this code change
I would argue against it. Obviously as we deal with text, the footprint is not measured in GBs. But the library should not be an obstacle if sb wanted to write a software where downloading data from ORCID is a part of a large pipeline spawned in hundreds of processes. Then you can write code like:
class Pipeline:
def __init__(self, arg1, arg2):
self.api = orcid.memberAPI(arg1, arg2)
def run_pipeline(self):
small_info = self.get_data_from_orcid()
self.run_time_consuming_code(small_info)
def get_data_from_orcid(self):
small_info = orcid.something()['somekey']
return small_info
...
I am not sure what is the behaviour of the API now, but back when version 1.2 of API was used, in case of downtimes, an HTML page with significant amount of JS code in it was returned as content for any request.
I'd love to have this feature optnional, but given that this would mean to complicate the code a lot, I'd prefer to have this as default behavior.
Sometimes API object is released just after it was used (and probably it's the case of INSPIRE) but in many cases it is not the natural way to write the code. Actually, one does not expect from a wrapper to store any data that is not required for the communication. If the feature is optional and documented in README, developer who uses the library will be aware of this behaviour. Otherwise he stumbles upon a weird behaviour he might need/want to debug.
I would say in this case it's better to add few more lines in the code in order to provide the user with expected behaviour. I came up with one more solution (don't know if it's nicer than previous):
from contextlib import contextmanager
@contextmanager
def catch_response(instance):
instance.save_last_response = True
yield
instance.save_last_response = False
instance._last_response = None
class WithSavedResponse:
def __init__(self):
self.save_last_response = False
self._last_response = None
def set_last_response(self, value):
if self.save_last_response:
self._last_response = value
def get_last_response(self):
if self.save_last_response:
return self._last_response
else:
raise AttributeError("Last response is only availabe within")
last_response = property(get_last_response, set_last_response)
def function(self):
response = 'content returned by api'
self.last_response = response
return response[1]
w = WithSavedResponse()
try:
w.last_response
except AttributeError:
print("caught")
with catch_response(w):
res = w.function()
print(w.last_response == 'content returned by api')
print(res == 'o')
print(w._last_response == None)
try:
w.last_response
except AttributeError:
print("caught")
And about the _authenticate() method, this method performs 2 operations: asking for the authorization code and logging in using the end-user's credentials. I suggest to either remove self.response in this method entirely or to store only the authorization code call's response.
OK, I fully agree. Probably the most consistent behaviour is to store only the request from get_token_from_authorization_code
that is called in _authenticate
. I assume that's what you meant by removing self.response
entirely.
As the discussion is getting longer: Is there a chance that what you want to achieve within INSPIRE would be a good functionality for this library? Maybe the response does not have to be saved if the handling can be done within this codebase.
All right. I made this feature optional, I found a simple way with a kwarg in the client constructor. So this client would have the feature disabled:
orcid.PublicAPI(institution_key, institution_secret)
while this one would have it enabled:
orcid.PublicAPI(institution_key, institution_secret, do_store_raw_response=True)
@turtle321 1.0.3 was released. It includes this PR.
We have some use cases in our team (Inspire at CERN) where we need some introspection on the actual response (the response coming from the Python HTTP
requests
library). Introspections about HTTP status codes, request and response content and headers