gitcoinco / cerebro

2 stars 2 forks source link

initial MVP #1

Open owocki opened 4 years ago

owocki commented 4 years ago

initial cerebro external MVP

owocki commented 4 years ago

@kamescg @danlipert any detail to add before i fund this + set Kames loose on it?

kamescg commented 4 years ago

@owocki + @danlipert) How do you want me to split the github scraper and frontend?

I normally use Lerna to manage multi-package/application repos but this is slightly different because it's a frontend and backend setup?

Perhaps just regular monorepo?

owocki commented 4 years ago

defer to dan on all tech decisions!

danlipert commented 4 years ago

@KamesCG monorepo is fine - lets start with the backend and make sure the API works correctly. I'm thinking the main gitcoin web repo will integrate at some point via API so lets make sure the frontend and backend are not too tightly coupled.

kamescg commented 4 years ago

@danlipert Few questions.

I am imagining I will setup 2 databases: data "warehouse" and data "store"

1) Warehouse: Initially collect data points (Github, Stackoverflow, Twitter, etc...) 2) Store: A better, more organized database that organizes the warehouse information? For example, deduplicate the data it would go in the store.

Or should it just be 1 big warehouse?

danlipert commented 4 years ago

definitely I think 2 databases is appropriate - one will contain the relevant data needed for on-the-fly queries by the frontend application (preferably postgres). The other will likely be some NoSQL type database that stores the raw scraped data which is later processed, probably something like cassandra. We can likely skip the cassandra portion for now if we start with one data source with an API that feeds relevant data directly so I'm thinking lets start with Github and skip the warehouse portion for now