Closed josh-chamberlain closed 1 year ago
Code I used to create queries based on a user's input for the GUI:
# Get user input from text box
homepage_url = self.homepageURLSearch_input.text()
owner, repo, branch = 'pdap', 'datasets', 'master'
# Create query
query = f'''SELECT * FROM `agencies` WHERE `homepage_url` LIKE "%{homepage_url}%"'''
# Send query to dolthub
res = requests.get('https://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})
# Get response as json
jsoned = res.json()
# Filter out everything except the "rows" table
expression = jmespath.compile("rows[]")
self.searched = expression.search(jsoned)
Related to https://github.com/Police-Data-Accessibility-Project/PDAP-Scrapers/issues/80, #173
Tasks
General purpose
This is a Python module called something like
extraction_metadata.py
in/common
which generates metadata on the fly by using the dolthub API to get the most up to date information about the scraper at the time it's run.Pinging the DoltHub API
Because scrapers and datasets are subject to change constantly, this should be be done on-the-fly.
This is python3 which gets all the agencies. We should still make a more useful query which just needs to substitute in the dataset ID.
Sample metadata