Open kochbj opened 2 years ago
I took a peek at the code and really like the abstraction and introduction of a OpenAlexBaseDataType. Thanks 🤩 I have a lot on my plate right now, so it will probably take me a week or two to take a closer look.
Sounds good. Personally, I would've dropped the two works methods and just made one list of entities method, but wanted to make it backwards compatible with current methods.
Also, no worries. I will probably make a few more minor changes in next week or so as I continue to use before you get around to looking at it.
I'm a fan of https://en.wikipedia.org/wiki/KISS_principle?wprov=sfti1 Let's remove redundant methods and keep general ones.
Which ones do you suggest we remove?
Well if it were me, I would probably remove all entity-specific methods and just keep get list of entities, not make it a private method, with an argument for entity type. I didn't write it this way because I didn't want to break what you already had.
I would keep the cited,reference, and related works methods--I think they are great. For myself I needed to add the below, which should work with any entity type that supports associated_works (authors, venues, institutions). I needed all the header stuff to make sure it would work for dehydrated objects as well.
I'm also going to fool around with a search method if you give me a day or two...
class OpenAlex(BaseModel):
"""This models the OpenAlex HTTP API
OpenAlex has 2 pools for clients.
Supplying your email will get you into the polite pool.
:parameter=email
"""
email: Optional[EmailStr]
_base_url: str = "https://api.openalex.org/"
_headers: dict = {
"Accept": "application/json",
"User-Agent": f"OpenAlexAPI https://github.com/dpriskorn/OpenAlexAPI"
}
#Convenience dict because dehydrated entities do not have works_api_urls and annoying inconsistencies in endpoints (institution vs instititions, host_venue vs venue)
_works_urls: dict = {
Author: _base_url+"works?filter=author.id:",
Concept: _base_url+"works?filter=concept.id:",
Institution: _base_url+"works?filter=institutions.id:",
Venue: _base_url+"works?filter=host_venue.id:"
}
``` class Config:
underscore_attrs_are_private = True
def set_email(self,email: EmailStr):
self.email = email
self._headers = {
"Accept": "application/json",
"User-Agent": f"OpenAlexAPI https://github.com/dpriskorn/OpenAlexAPI mailto:{self.email}"
}`
` @backoff.on_exception(backoff.expo,
(requests.exceptions.Timeout,
requests.exceptions.ConnectionError),
max_time=60,
on_backoff=print(f"Backing off"))
def get_associated_works(self, entity: OpenAlexBaseType, limit: int = None) -> List[Work]:
"""Fetches all works associated with the entity, up to some limit.
:parameter work is OpenAlex Institution, Venue, Author
:parameter limit is the maximum number of works to return
"""
if self.email is None:
print("OpenAlex has 2 pools for clients. "
"Please be nice and supply your email as the first argument "
"when calling this class to get into the polite pool. This way "
"OpenAlex can contact you if needed.")
per_page = PAGE_LIMIT if limit is None else min(PAGE_LIMIT, limit)
works = []
cursor = '*'
while cursor:
url = f"{self._works_urls[type(entity)]}{entity.id}&per_page={per_page}&cursor={cursor}"
response = requests.get(url, headers=self._headers)
if response.status_code == 200:
works += [Work(**w) for w in response.json()['results']]
cursor = response.json()['meta']['next_cursor']
else:
raise ValueError(f"Got {response.status_code} from OpenAlex")
if limit and len(works) >= limit:
break
return works[:limit]
`
Last couple of thoughts:
-Institutions, Venues, and Author fields need to be expanded (but methods have to work with dehydrated values) -There are a couple supplementary objects (i.e., years, ids) that show up on multiple core types. Do you want to just expand these so that they work for every entity or have entity specific entities. -The formatting of Year in OpenAlex is annoying; It would make more sense to have these attributes as lists rather than dicts and fill in 0 years rather than omitting them. Do you want to parse these into lists or do you feel that's beyond the scope of the package?
@hstct you implemented a dehydrated author type in #1 which has not yet been merged. Do you have any reactions to this pull request?
I'm also going to fool around with a search method if you give me a day or two...
Have you seen the search function in R here? https://github.com/KTH-Library/openalex/blob/main/R/open_alex_restclient.R BTW I just created a project where we can work on user stories that make sense to aid in the development of the library https://github.com/users/dpriskorn/projects/9/views/1
Hi Dennis,
If you give me less than an hour, I'm going to implement another branch. This one breaks current works functionality by generalizing for entities. The full set of changes I am adding:
Responses to your points:
Hi Dennis,
I'm pretty much done with my rewrite in last major push. This does break existing get_works, but I think it does make API quite robust. I also changed my tune on dehydrated objects--they're added.
https://github.com/dpriskorn/OpenAlexAPI/pull/10/commits/e97491f3588d8ca1bb5d54f501a9249214c030aa
Best, Bernie
Big thanks for your effort. I'll try to make time to review it next week. Since we are still in alpha I don't have a problem with removing get_works() at this point.
Hi Dennis,
Great API. I made some small changes to make the API more useful to me as a user:
Fleshed out author fields and modified Ids and Years a little bit so they work with both authors and works.
Dunno if this is how you planned to extend things so no worries if this doesn't work for you!
Best, Bernie