enjalot / blockbuilder-search-index

download and process d3.js blocks for further indexing and visualization
BSD 3-Clause "New" or "Revised" License
24 stars 5 forks source link

create a gcp cloud function to clone gists #47

Open micahstubbs opened 7 years ago

micahstubbs commented 7 years ago

questions

enjalot commented 7 years ago

Let's try using PubSub to trigger the functions (see Tutorial)

We can send messages like this {usernames: ['enjalot', 'micahstubbs'], since: "2017-09-30T20:34:21.648Z" } to the PubSub channel with subsets of all our users and have the Cloud Functions figure out how to scale horizontally to handle it.

Potential query rules around time: since could be 15min, 20min or an ISO datetime string. If its nothing we get all blocks. Follows GitHub's query parameter for listing gists.

enjalot commented 7 years ago

Following tutorial: Made the blockbuilder-search-function-staging storage bucket to stage our functions.

Deploy the function gcloud functions deploy getGists --trigger-topic get-gists --memory 1024

Trigger the function gcloud pubsub topics publish get-gists --message '{"users":["micahstubbs","enjalot"]}'

enjalot commented 7 years ago

Developing with the cloud functions emulator

functions deploy getGists --trigger-topic get-gists
functions call getGists --data='{"users":["ivyywang","tarekrached"]}'
functions logs read
enjalot commented 7 years ago

note: biggest constraint is probably memory, writing to /tmp uses memory.

enjalot commented 6 years ago

I pushed some changes, and updated the above comments to be in step with the latest in cloud functions.

re: API rate limit, cloud functions don't get us around it, but we can rotate oath keys they use whenever they run out.

I made it so the getGists function now saves the gists it fetches in the gists namespace as gist entities. I had to modify the gist metadata slightly, to remove . from filenames (replaced with a | to make the Datastore API happy). I also store the username and github id of the user as owner_login and owner_id and delete the owner object. this way its easy to filter by the username in datastore.