cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
108 stars 41 forks source link

cBioPortal Data Collection Automation #84

Open inodb opened 6 years ago

inodb commented 6 years ago

Background:

The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.

Whenever data submissions come from external sources, a lot of manual curation needs to be performed to make sure the data is imported smoothly and rendered correctly in the cBioPortal. We would like to automate parts of this data curation process which will be in part handled through our datahub, a data repository that stores all cancer study data that is currently available in the cBioPortal.

Currently, whenever a Pull Request is made to datahub, the data undergoes a series of validation steps run by our data validation tool. However, to ensure that the data looks and renders as expected in the cBioPortal, one must manually import the data into a live instance of the portal. Automating this step in particular will be hugely beneficial to the QC process and greatly improve the turnaround time from data submission to import and visualization in the cBioPortal.


Goal:

Streamline and improve the turnaround time and review process for cancer study data submissions by automating the import of validated data files into a live instance of the cBioPortal.

Approach:

One option for spinning up review apps includes Heroku, which we use for reviewing changes to the backend of cBioPortal.

Another option might be Github Action for AWS Lightsail.

Both platforms support docker compose, for which configuration files already exist.


Needed skills:

Possible mentors: @inodb

css911 commented 5 years ago

Hello!. It's Chetan. The idea is quite interesting. would like to work on it. To start with what task should I perform?

inodb commented 4 years ago

@ao508 I noticed this was transferred from GSoC. If we are not working on it, maybe we can transfer it back?

ao508 commented 4 years ago

@inodb that's okay with me

daniocionini commented 2 years ago

Very interesting idea. I would like to have a go at it, where is the open source code to start from?

jagnathan commented 2 years ago

the source code is available in github. https://github.com/cBioPortal

devharsh2k4 commented 1 year ago

hey am interested in this project can u guide me further @inodb

muskan-k commented 1 year ago

Hi @inodb ! I'm Muskan Kothari, currently a CSE senior at PES University, India. I'm here to contribute to this project through GSoC '23. I studied biology prior to starting undergrad in CSE and I'm highly interested in applying CSE to interdisciplinary domains. Having said that, I do have multiple projects involving computer science fundamentals to biology (Measures of lexical diversity and Alzheimer's detection) and physics (Tree based models for critical temperature of super conductors).

I also have experience working in big data and devops technologies like Docker and Kubernetes (converting monolith application to micro-services), PySpark and Hadoop (sentiment analysis of twitter).

I am proficient in programming languages likePython, C++ and Java and comfortable using Git.

I found the cBioPortal organization a perfect mix of my interests in interdisciplinary projects and my skills in various technologies that particularly help this project - cBioPortal Data Collection Automation. I'd love to learn and contribute to this project.

I understand that working on some issues would strengthen my application and I will also be spending time understanding the organization. I'd like to get started with my proposal. I've joined the slack as well.

Could we perhaps set up a discussion call? Could you tell me what technologies would be involved under DevOps?

Thanks! Muskan