CatalystCode / project-fortis-pipeline

Project Fortis is a data ingestion, analysis and visualization pipeline.
Apache License 2.0
14 stars 9 forks source link

Improve developer experience #228

Closed c-w closed 6 years ago

c-w commented 6 years ago

Onboarding new developers onto Fortis is hard since there are so many moving pieces. We should have Docker images for every service so that we can use docker-compose up to spin up the pipeline with a single command.

@cicorias also has some thoughts on this.

erikschlegel commented 6 years ago

Can we be more prescriptive on what these missing pieces are? Saying that we should have docker images for every service seems ambiguous. Things like setting up auth and event hubs cant be dockerized in azure.

c-w commented 6 years ago

For every service that's not in Azure, so anything that a developer would have to set up. Off the top of my head:

erikschlegel commented 6 years ago

@cicorias your comments seem ambiguous. I still don't have a clear sense on where the crux of the problem is. I see this as more of an issue around lack of documentation and the absence of a dev/QA instance. Today is the first that I'm seeing this issue being raised which I agree that we need to improve the overall dev experience, but we should be crystal clear regarding where the core dev setup pain points reside.

If there's QA instances for both cassandra and postgres then it doesnt make sense that each developer stands up local instances for each stack. Our Postgres instance holds an entire copy of OSM.

Agree that we need to have a repeatable process where devs can setup a local instance of spark. Just as an FYI, it's not like we haven't been clear on what spark documentation is missing in the project https://github.com/CatalystCode/project-fortis-spark/issues/165.

Setting up both services and interface is a matter of setting up your env variables, then 1) git clone 2) npm install 3) npm start. I can see devs getting tripped up on what env vars to setup, where beefing up our docs will help.

c-w commented 6 years ago

The pain point is that developers don't know how to set up the project. We've seen this concretely recently with the onboarding of 2 SDEs and 1 Senior SDE who all struggled with this. This loss of productivity is unacceptable so we should do something about it.

Documentation is one way to improve developer experience, however, it has the tendency to get out of data and can be ambiguous.

We have tools like Docker and Docker Compose to remove this ambiguity and ensure that developers can always set up the project in a repeatable way across machines and environments. As such, I'd say that we should just have a docker-compose.yml in every repository that spins up the service and all the dependencies for that repository.

See https://github.com/CatalystCode/project-fortis-services/pull/167 for an example of how to go about this. After that pull request is merged, it's just one command to bring up the project-fortis-services: docker-compose up. This is much nicer than the previous steps where people manually had to install Cassandra, set environment variables, install the NPM dependencies and then run the server.

c-w commented 6 years ago

Pointer to the Docker work for project-fortis-spark done by @armanrahman22 and @michaelperel: https://github.com/armanrahman22/fortis-docker

erikschlegel commented 6 years ago

Docker will help but still doesn’t solve the developer task of defining the local environment, which is also part of the problem. More documentation will also help solve that problem.

c-w commented 6 years ago

The work involved here is:

c-w commented 6 years ago

Here are the Azure services that are referenced in the services which need to be created to run the project locally:

c-w commented 6 years ago

We now have an ARM template and script to set up the required Azure resources (like ServiceBus, CognitiveServices, etc.) for a local deployment of Fortis. So all that's left now to have a good developer experience is #237 that @jcjimenez is working on.

c-w commented 6 years ago

Sample dataset is quite large so cassandra keeps choking when importing it via the COPY command. Looking into this now.

c-w commented 6 years ago

Fixed the COPY issue by reducing the replication_count keyspace property down to 1 when running locally.

c-w commented 6 years ago

Fixed a few issues in the sample dataset. The interface now loads and displays some data, but initial render seems somewhat broken (e.g. no keyword selected). Looking into it.

c-w commented 6 years ago

All seems to be good now.

Docker-compose command:

image

Admin page:

image

Dashboard page:

image