ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
87 stars 52 forks source link

code modularity #268

Open chubukov opened 7 years ago

chubukov commented 7 years ago

I'm wondering what we can do to improve code organization and modularity and how we should organize the effort, assuming it's worth doing. The thing that most concerns me is the amount of repeated code. For instance, the logic for the new EIC sum/max feature is repeated basically identically in three different places. As another example, the code for pullIsotopes() and pullIsotopesBarPlot() is 90% identical. This is not limited to recent changes by any stretch, there are tons of legacy code segments that have identical logic implemented in multiple parts of the code.

I think it's a problem because it makes it much harder to get started with the code, and it makes concurrent development much trickier. It also makes it harder to track down bugs in new features.

I think the original refactor that started ElMaven made some significant improvements in this regard from original Maven, but I don't entirely remember.

chubukov commented 7 years ago

A similar issue is that I worry that we're getting away from the object-oriented approach. Here's an example function definition (just one example of many).

double PeakGroup::getExpectedMz(int charge, map<string, bool> isotopeAtom, int noOfIsotopes)

This function seems to get called with arguments taken from main window parameters, and it checks if the isotope that the user is looking for is within the maximum M+n. Why does any of this need to happen here? It seems that PeakGroup::getExpectedMz() should really be available with no arguments at all -- all the parameters (like charge, if needed) should be already available in the PeakGroup object. And whether this particular PeakGroup is an isotope that's within the allowed scope is a check that should happen completely outside, if it even needs to happen.

Similar examples seem to be popping up a lot. Again, this just goes to modularity -- if we can make the organization more consistent it will make it easier for more people to develop concurrently.

It may be that I'm missing something about what these functions need to do and why they're organized the way they are. But if that's true, then I fear almost no one will be able to follow the logic.

sahil21 commented 7 years ago

@chubukov We are setting up contributing/code guidelines for developers/contributors. Also, we'll put in place PR review guidelines. We have identified the following steps to improve the quality of code:

  1. Putting in place PR review guidelines (Only modular code with proper documentation will be merged)
  2. Freezing Coding Style Guide
  3. Learning about workflow documentation guidelines from another open source project
  4. Following guidelines for better commit messages: How to Write a Git Commit Message

We'll refer some good open-source c++/qt repositories like

  1. Chromium
  2. Qt Creator
  3. Shotcut

We'll make sure that the following are in place in Sprint 8.

chubukov commented 7 years ago

@sahil21 that sounds good, and will probably help. I do think that we might have to do some more global refactoring at some point to really address some of the messiness that's been present for a long time.