dondi / GRNsight

Web app and service for modeling and visualizing gene regulatory networks.
http://dondi.github.io/GRNsight
BSD 3-Clause "New" or "Revised" License
17 stars 8 forks source link

Refactor server code to facilitate error checking on multiple formats #300

Closed dondi closed 7 years ago

dondi commented 8 years ago

While implementing our first import function (for SIF) I noticed that many error checks that are performed for the original .xlsx format will apply to other imports as well. However, the current code structure prevents those from being leveraged easily. The code will have to be refactored this way:

kdahlquist commented 8 years ago

A reminder from issue #316:

"Node names must be 12 characters or fewer." —The import code does not enforce this yet; it’s one of the checks that needs to be refactored out of the current GRN-loading code.

kdahlquist commented 8 years ago

Checker code needs to be moved out of Excel function.

kdahlquist commented 8 years ago

@yshin4 and @mihirsamdarshi will now start working on this issue.

kdahlquist commented 8 years ago

@yshin4 will put her list of semantic vs. syntactic errors on the wiki.

Then she and @mihirsamdarshi can divvy up the semantic ones and start refactoring them.

yshin4 commented 8 years ago

posted to wiki. Will meet with @mihirsamdarshi and discuss.

kdahlquist commented 8 years ago

When you guys work on the dataTypeError, you should keep in mind issue #343 where @NAnguiano reported that the graph statistics function of Cytoscape won't allow special characters except for "-" and "_". Even though GRNsight can now handle them, when we implement the graph statistics, it will become a problem. It is probably better to just go ahead and restrict it now instead of waiting. This means that whenever GRNsight discovers a special character, it will need to error out and report to the user that it is not allowed.

yshin4 commented 8 years ago

Added specialCharacterError, will implement it in Try&catch in spreadsheet controller.

kdahlquist commented 8 years ago
  1. Create semanticChecker function
  2. remove code from Excel and put it into semanticChecker function
  3. call semanticChecker function from SIF and GraphML

@mihirsamdarshi should start with the Gene name length check because it is an easy one to do.

Also read the documentation page to get familiar with SIF and GraphML formats.

yshin4 commented 7 years ago

Done coding in specialCharacterError function and implementing it in Try and Catch. But I don't want to break anything so I'll check with other team members on Monday before pushing.

yshin4 commented 7 years ago

Implemented special character error. Edited both spreadsheet controller and test files. Updated Error Sort on Wiki.

kdahlquist commented 7 years ago

semanticChecker is almost working; Most tests are passing except for 3, they need to talk to @dondi in office hours this week.

They need source genes and target genes, but they can't get them both from network.

kdahlquist commented 7 years ago

The issue from the previous comment is still there; they need to talk to @dondi .

dondi commented 7 years ago

@mihirsamdarshi and @yshin4 can look at the (failing) error list then track down where the spurious errors could have been added. Following those should eventually lead to the offending code.

dondi commented 7 years ago

This is a good example for why test cases should be as narrow as possible, because another potential reason for the spurious error(s) is that the test data itself unintentionally has more semantic errors than intended. Ideally the test data should be formulated so that they are perfect except for the single error type being tested.

dondi commented 7 years ago

@yshin4 and @dondi met on Tuesday evening and determined that the core issue with @yshin4's duplicate check errors is that only the Excel format has a notion of row vs. column. Thus there are actually two types of duplicate errors: one that can differentiate whether the duplication occurred along a row or column dimension, and one that determines a duplication within the overall list of gene names. The former is thus more of a syntax error because it is file format dependent, and the latter remains as a semantic error.

Given this observation, @yshin4 will implement duplication checks at both levels: for the Excel reader only, we can detect duplications in rows vs. columns distinctly. For all other importers which do not have this concept, we will wait until the consolidated list of genes emerges from the import and then do a duplication check from that list.

kdahlquist commented 7 years ago

All of the cases on the wiki here have been implemented: https://github.com/dondi/GRNsight/wiki/Error-Sort

So, question for @dondi : are there additional semantic tests that need to be written or can we safely close this now because we have new issues for syntactic tests of graphml and sif #362 and #363.

dondi commented 7 years ago

I inspected the code and this specific issue can be closed. However I noticed that the semantic checker has only been integrated into the Excel reader, and not SIF nor GraphML. I will create those as separate issues.