cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
106 stars 41 forks source link

Prototype Spark + Parquet + Spring Boot for web API #69

Closed jjgao closed 4 years ago

jjgao commented 5 years ago

Background:

Currently, cBioPortal utilize relational database (MySQL) for data storage. As the number and size of cancer datasets increase, we are interested in exploring high-performance computing and storage in order to provide an adequate cBioPortal user experience.

Goal: Develop a prototype to implement a subset of our current web API with Spark + Parquet (or another DBMS) + Spring Boot. The new API should perform better for large dataset.

Approach:

Phase 0 (before submitting proposal):

Phase 1:

Phase 2 (optional):

Note: if you have a better idea and would like to propose a different stack, please contact us at Phase 0 before proceeding.

Need skills: Spark, Parquet, Java / Spring Boot

Possible mentors:

GayanSandaruwan commented 5 years ago

Hi, I'd like to work on this project, so for the phase 0, what should be the capabilities of the demo web app ?

jjgao commented 5 years ago

@GayanSandaruwan thanks for your interest. Phase 0 would be a simple proof of concept, so as long as you can use the stack to develop a website, it would be good enough.

GayanSandaruwan commented 5 years ago

Grea👌

On Thu, Mar 21, 2019, 4:55 PM JianJiong Gao notifications@github.com wrote:

@GayanSandaruwan https://github.com/GayanSandaruwan thanks for your interest. Phase 0 would be a simple proof of concept, so as long as you can use the stack to develop a website, it would be good enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cBioPortal/GSoC/issues/69#issuecomment-475193355, or mute the thread https://github.com/notifications/unsubscribe-auth/AQgyZ0w8RyipsNmXEK_qaDtOBjkzLVurks5vY2wMgaJpZM4bsWSc .

n1zea144 commented 5 years ago

@jjgao One thing we should put on the Approach, which I think goes in between Phase 1 and 2 is the ability to fetch data from multiple studies.

jjgao commented 5 years ago

@n1zea144 Good point. I've updated the text.

justasunil commented 4 years ago

@jjgao Hi! I want to work on this project. Can you tell me one thing? Is this project coming in GSoC,2020?

jjgao commented 4 years ago

@sunil-17112 thanks for your interest. This was done in 2019.