cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
108 stars 42 forks source link

Standalone Importer w/Spring Batch #68

Closed ao508 closed 3 years ago

ao508 commented 5 years ago

Background: The cBioPortal is a valuable resource for navigating, downloading, visualizing, and analyzing cancer genomics data. The portal supports a variety of datatypes and accepted file formats and we have made available a meta import script to facilitate the importing process of this data. To learn more about the overall cBioPortal data loading process, please refer to data loading documentation.

The meta import script is written in Python and wraps the Java importer classes from the cBioPortal core-scripts package. However, in an effort to make the cBioPortal backend a full Spring MVC application, we wish to refactor the importer and move it to an external dependency that can run as a simple standalone tool. An overview of the expected workflow is detailed here.

Goal: The overall goals of this project will be:

  1. Refactor the cBioPortal importer (scripts) package to an external repository that can be run as a standalone Java program.
  2. Migrate the cBioPortal model and persistence layers to an external repository which will be brought into the importer as an external dependency.

Approach:

The importer should be built with Spring Batch and the SQL statements should be implemented with MyBatis-Spring.

The bulk of the importer logic has already been implemented and can be reviewed in this pull request. However, when this work began, mybatis-spring was not supported yet for the latest Spring Batch version at the time. As such, the SQL statements were implemented using JDBC.

Initial development efforts on migrating the cBioPortal model and persistence layers can be reviewed from this fork of the cBioPortal/common repository. The importer should be able to pull in any version of the cbioportal/common dependency with JitPack.

Need skills: Java, SQL, working knowledge of command line tools

Possible mentors: @ao508 @n1zea144

kloun commented 5 years ago

I also interesting to participation on this project. I familiar with java and sql.

rishabhBudhouliya commented 4 years ago

Hi, is this issue still open? I mean, is someone working on this?

ao508 commented 4 years ago

@rishabhBudhouliya Yes, this is still an open issue and project idea for GSoC.

rishabhBudhouliya commented 4 years ago

@ao508 Hi, should I try to understand and start on this or are there any initial tasks you would want me to do first, which would make me more comfortable with the codebase?

ao508 commented 4 years ago

@rishabhBudhouliya The Spring importer code has drifted out of date with the current importer classes checked into the cbioportal/master codebase. I will discuss with our team to identify where is the best area to pick this up.