Open codemickeycode opened 9 years ago
Celery task for processing the data - will take care of downloading the CSVs from PhilGEPs endpoint on a periodic basis
TODO: research celery and django celery
The API from data.gov.ph seems to be working. We can access the api via something like:
http://api.data.gov.ph/catalogue/api/action/datastore_search?resource_id=314aa773-e6e4-4554-80ce-4f588212e0d1&limit=1
Each table corresponds to a particular resource. Click the more information on each table to know the resource here: http://data.gov.ph/catalogue/dataset/philgeps-public-data
The solution is to access 2 endpoints:
datastore_search
- The response from this endpoint will contain a total key that will indicate the number of records for the resource.datastore_search_sql
- This will allow us to query the resource using sql range:
i.e.SELECT * FROM "_<resource_id>_" WHERE _id BETWEEN 1 AND 1000
This will allow us to loop through the resource up to the total number of records.
I suggest using Django's builtin paginator class to loop through the resource as it is quite memory efficient. A sample usage would look like this:
1 from django.core.paginator import Paginator¬
2 ¬
3 ¬
4 paginator = Paginator([i for i in range(1, max_id + 1)], 1000)¬
5 ¬
6 for page_num in paginator.page_range:¬
7 page = paginator.page(page_num)¬
8 process(page.object_list)¬
9 ¬
10 ¬
11 def process(ids):¬
12 sql = 'SELECT * FROM "adfasfasdf" WHERE _id BETWEEN {} AND {}'¬
13 params = {¬
14 'sql': sql.format(ids[0], ids[-1])¬
15 }¬
16 res = requests.get(url, params=params)¬
17 ....¬
~
Create a facility for periodically updating/fetching Organization, Awards, Bidders List, Bid Line Item and Bid Information from PhilGEPS Public Data