Closed mig5 closed 2 years ago
Here's another example, this time with categories:
#!/usr/bin/env python3
import requests
from pprint import pprint
def main():
# Standard header for API calls
headers = {"Content-Type": "application/json"}
# URL of the ALISS import API
url = "https://www.aliss.org"
r = requests.get(f"{url}/api/v4/import/", headers=headers)
aliss_data = r.json()["data"]
while r.json()["next"]:
next_url = r.json()["next"]
if url not in next_url:
next_url = url + next_url
try:
r = requests.get(next_url, headers=headers)
r.raise_for_status()
aliss_data.extend(r.json()["data"])
except requests.exceptions.HTTPError as err:
print(err)
breal
# Sort Services by ID
aliss_data.sort(key=lambda x: x['id'], reverse=False)
for item in aliss_data:
if item.get("categories"):
for cat in item["categories"]:
print(f"Item {item['id']} has category {cat['slug']}")
else:
print(f"Item {item['id']} has no categories")
pprint(item)
root@stage01:~# ./aliss_fetch_categories > categories-1.txt
root@stage01:~# ./aliss_fetch_categories > categories-2.txt
If I grep for service 0014b87f-e3b7-49c2-b857-40eb3383e33a
in categories-1.txt I get:
root@stage01:~# grep 0014b87f-e3b7-49c2-b857-40eb3383e33a categories-1.txt
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category social-activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category children-families
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category parent-toddler-group
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category social-activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category children-families
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category parent-toddler-group
If I grep in categories-2.txt I get half the entries (side note: 'activity' always seems duplicated...):
root@stage01:~# grep 0014b87f-e3b7-49c2-b857-40eb3383e33a categories-2.txt
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category social-activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category activity
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category children-families
Item 0014b87f-e3b7-49c2-b857-40eb3383e33a has category parent-toddler-group
If I grep for ffa6216b-4274-4826-910b-be342b51f262
in categories-1.txt I get double the results:
root@stage01:~# grep ffa6216b-4274-4826-910b-be342b51f262 categories-*
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-and-homelessness
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category disability
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category conditions
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category sensory-disability
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category conditions
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-support
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-adaptations
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-and-homelessness
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category disability
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category conditions
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category sensory-disability
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category conditions
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-support
categories-1.txt:Item ffa6216b-4274-4826-910b-be342b51f262 has category housing-adaptations
If I grep for it in categories-2.txt I get no results, as though the service wasn't returned at all! Same with ffa6216b-4274-4826-910b-be342b51f262, and plenty others.
Here's one more example.
This python script fetches all the results from the import route, and then prints the 'name' attribute of each service to a text file.
#!/usr/bin/env python3
import requests
import time;
def main():
# Standard header for API calls
headers = {"Content-Type": "application/json"}
# URL of the ALISS import API
url = "https://api.aliss.org/"
aliss_data = []
r = requests.get(f"{url}v4/import/", headers=headers)
raw = r.json()["data"]
for i in raw:
aliss_data.append(i)
while r.json()["next"]:
next_url = r.json()["next"]
if "/api" in next_url:
next_url = next_url.strip("/api")
next_url = url + next_url
try:
r = requests.get(next_url, headers=headers)
r.raise_for_status()
raw = r.json()["data"]
for i in raw:
aliss_data.append(i)
except requests.exceptions.HTTPError as err:
print(err)
break
timestamp = int(time.time())
with open(f"aliss-names-{timestamp}.txt", "w") as outfile:
for item in aliss_data:
outfile.write(item["name"] + "\n")
if __name__ == "__main__":
main()
Running it twice on the same machine, one right after the other, I get different ordered results but I also get results in one fetch that didn't exist in the other (example: Deeside Stroke Group, Nemo Arts Embroidery)
I have attached 2 outputs of this script so you can compare them to see what I mean.
It feels to me like each request to the ALISS API is actually perhaps hitting a different backend server or database, returning different results depending.
Even aside from the script, if I go to your page https://api.aliss.org/v4/import?page=278 in my browser, and refresh the page several times, eventually 'Deeside Stroke Group' disappears from the result. So I know it's not my script, at least :)
Everything is now working since your fix went live. Thanks!
Hi,
I am trying to use your 'http-import' API route to fetch all services.
I am then trying to make a separate list of unique locations, organisations, and categories, as it seems these do not have an endpoint of their own (e.g each service may list the same organisation or service each time).
Here is my example script that:
1) fetches all the services 2) sorts them by id, 3) iterates over all the services to obtain any 'locations' 4) any locations that don't already exist in the outer 'locations' dict, are appended to it 5) sort the locations by id
What is odd is that although I get the same amount of services every time (5560), I can run the same script several times in succession, and I get different results each time:
Any idea what is going on here?
A slightly modified version of the script that just counts the number of locations (doesn't try to skip any locations that already exist in the array):
I am getting similar issues with the 'categories' and the 'organisations' lists within each Service.
As far as I can tell, there's nothing I can do about this, the actual data returned from your API is inconsistent each time.
Appreciate any help!