ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

A State Testing Platform for Critical Scientific Data Development #6

Closed RBerkheimer closed 6 years ago

RBerkheimer commented 6 years ago

Idea

Harness was initially developed as a state testing platform for the PHA (Pairwise Homogeneity Algorithm), a complex and rigorous climatological algorithm that performs automated quality control on the temperature station readings of the official climate record of the United States.

The idea is to provide a pseudo workflow system to parse entire projects gathered from version control systems as components, identify parts of the projects as functions, and join the functions together in workflows to produce structured output (which the user defines) that can be compared as the code producing these structures changes over time.

Harness performs state testing by allowing projects to be locked in place for a given revision or saving output state and easily comparing outputs. This allows complex, quickly changing, scientifically critical data to be analyzed visually, through a web based, single page Angular application. Harness aims to be easy to use for developers - it is largely configurable entirely through the UI.

Funding for Harness evaporated when the PHA project was complete in 2016, but the project was proven to work well and is ripe for development and new ideas. It has potential to serve as a powerful tool for many other scientifically critical projects.

Students would be working on improving Harness, with the following goals in mind:

Skills Needed

Mentors Ryan Berkheimer, Senior Software Engineer, GST, NOAA Affilliate

RBerkheimer commented 6 years ago

For interested students, I'm sharing some correspondence I sent to an interested student as a reply I received through email:

According to GSoC rules, student applications open on March 12th. But I can help you get started planning your application. I'm glad you already read the white paper - did you watch the ESIP presentation as well? It provides some more context for the project and its architecture.

The project has been officially unfunded since 2016, but since then I have updated it to run on Python 3, along with the MARS structures system that it uses for parameterization. I will provide both updated repositories, with an updated README and developers guide, by the end of February on my GitHub account and post links to them in the project. When that becomes available, you will be able to see how the project is set up and how it works and get it running with the end to end tests that are set up with example data.

The project itself has many avenues for development - I see an immediate need for user accessibility, deployment automation, additional VCS support, UI development, additional language interoperability through the JVM, and performance improvements in the dynamic class-loader.

Which of these project pieces do you find most interesting and would you like to make the focus of your development? That would be an important question to determine a path forward. We can make the focus one of these or several, depending on your interests. I would also suggest gaining at least a basic understanding of the concept of universal data models and distributed workflow systems (if you haven't already studied these topics), as those concepts are essentially wedded in this project.

In my personal view, the greatest needs are automated generation of closure wrapping around component functions (automatic code generation), additional language interop with the JVM, a fully fleshed out UI, and threading support improvements (possibly with some distributed solution) for the dynamic class loading, in that order.

To give yourself an idea of the end goal of this as a summer project, I envision that its success would manifest as a successful test against a randomly selected project that has technologies which fit inside our capabilities - i.e., if we end up with the ability to support Java, Fortran, and Python projects housed in un-versioned filesystems and git repositories, we should be able to select a random project that has these bounds, quickly set up stateful tests against this project, make modifications, and identify state changes (basic diff analysis) in these iterations. Additional analyses and/or capabilities would be icing on the cake.

The project was very exciting and fun to work on when funded and has the potential to be a tool of great utility to the community. I do see it as an evolution and/or complement to the testing tools currently available.

RBerkheimer commented 6 years ago

The python 3 version of the repos related to this project are now posted under the MIT license and live on github -

https://github.com/RBerkheimer/pythoncommons https://github.com/RBerkheimer/mars https://github.com/RBerkheimer/harness

I've been working with a student on preparing an application for this project and we have the summer fairly well mapped out at this point, but always welcome other interested parties.

SandySingh commented 6 years ago

Hello! Can I also work on this project? I know some Python, C and JavaScript and I'm new to Open Source.

RBerkheimer commented 6 years ago

Hello Sandy! Yes, we are looking for more contributors. You can definitely work on the project - are you looking to submit an application to GSoC as well? Please let me know.

SandySingh commented 6 years ago

Yes, I would like to submit an application to GSOC

RBerkheimer commented 6 years ago

Great! The first step would be to get the repos and begin to get that code working - once you have the repositories set up, try and follow the DEVELOPMENT_README to get the system to install, prepare the test environment, and run through a series of test evaluations.

I'm working on getting this set up with automated builds on push, but until then, you'll have to follow the readme. If you have issues, let me know. Feel free to email me at rab25@case.edu so we can talk in more detail about where we're at, what the needs are, what our focus is, and what you'd like to contribute.

I'm having a telecon with another student this morning to go over her application proposal - we are currently planning on focusing on fleshing out the UI at the first half of the summer term, and then moving toward implementing more code drivers and improving the performance of the native python drivers. This is a project with a large scope so there are plenty of projects to choose from!

danijak commented 6 years ago

Hi, my name is javed ali, student of IIT KHARAGPUR. I want to contribute in this project. Could someone guide me?

SandySingh commented 6 years ago

@danijak First follow the installation procedure in the above comment and currently installation is only possible in Linux environment