Add "syntactic" tests for graphml format

kdahlquist commented 7 years ago

Need to add more tests for graphml format input files. @ebachoura and @eileenchoe can either split SIF and graphml or both work on both. Discuss with @dondi this week.

kdahlquist commented 7 years ago

Assigning this solely to @ebachoura (@mihirsamdarshi has been given other tasks).

Here's how to approach this one:

Read the documentation on the Documentation page of the website to fully understand the specification for this format in GRNsight (which only uses certain aspects of GraphML).
Look in the test suite to see what tests are already there from @dondi's initial implementation
Enumerate the cases for which we don't have test coverage, looking at the cases for the Excel format may be helpful here. Perhaps we could have a meeting with just @kdahlquist, @eileenchoe, and @ebachoura to go over likely edge cases to look for. The errors and warnings glossaries on the wiki will also be helpful; it would be a good place to add any new cases.
Then write the tests.

kdahlquist commented 7 years ago

Warnings that have already been written are now showing up in the warnings box on GRNsight when use uploads bad file.

edlo commented 7 years ago

@kdahlquist I am free to meet during these times this week. I am also hoping to meet before Wednesday.

Tuesday 2:15 - 4:00
Wednesday 11:20 - 2:25

kdahlquist commented 7 years ago

@ebachoura I can meet on Tuesday at 2:15; please come to my office, LSB 289.

kdahlquist commented 7 years ago

@ebachoura and @kdahlquist just met and discussed strategy for how to approach these tests.

Use simple test files that are "bare bones" and hone in on the thing that needs to be tested. A simple graph of four nodes called A, B, C, D with four edges covering the different edge types was suggested.
Think "outside the box", i.e., don't approach the program in the way that a user is expected to do by following the directions. Instead, think of all the dumb things that could be done to break the program. Think about extreme values and edge cases.
The first test is whether it is a .graphml file and not some other filetype.
Then go line by line through the sample graphml file shown on the documentation page of the website and ask the questions:
- Does this line exist?
- is it formatted properly?
- does it have the required information
We are ignoring a lot of what graphml can do. Question is whether to warn the user that this is happening. @dondi should comment at Wednesday's meeting.
Keep in mind that some of the things that are in the GRNsight-exported graphml are there for compatibility with Cytoscape and yED and are not actually needed by GRNsight.
Ask questions on this issue.

dondi commented 7 years ago

Regarding point 5 in the above comment, there are actually precedents on both sides regarding this issue:

Web browsers ignore, without errors, any tags in HTML that they do not recognize
Microsoft Office and other productivity apps will frequently flag unused features when certain files are opened, particularly if such files are not in an app's native format

@dondi does not have a specific preference for which case should be followed; it all depends on how crucial it is for the user to know that the file they are using has data that is ignored in GRNsight, vs. the frequency with which users might load up GraphML.

edlo commented 7 years ago

I wrote all of the graphML test cases that I need and I wrote up the warning for the missing edge weight, but I still need to get it to display the warning text, so I will talk with @anuvarsh on how to do that properly.

kdahlquist commented 7 years ago

At the meeting last week, @ebachoura asked about whether he needed to test for a case where there were edges described, but no nodes. The answer was "yes" and led to the discussion about blank files which was documented in another issue #428.

Then the question became, "do I need to test to see whether each element of the GraphML" is present? and the answer is "yes". You also have to decide whether the absence of a missing element should be a warning (doesn't affect graph layout) or an error where the user shouldn't be allowed to do that.

It is worth @ebachoura meeting with @anuvarsh to discuss the types of cases needed. Even though @anuvarsh worked with .xlsx files, the principles are the same. There are going to be a lot of test files to cover all possible cases. As you can see, even with that, we are coming up with additional cases at this late date.

On wiki, there are errors and warnings glossaries that you should look at, e.g., here: https://github.com/dondi/GRNsight/wiki/Errors-Glossary

These could likely be updated with the new cases from SIF and GraphML.

kdahlquist commented 7 years ago

I tagged this issue with a "for next release" label because we want to incorporate the additional syntactic tests for graphml into the next release. The list of tests need not be exhaustive at this point (because we will likely add more), but whatever we have needs to function correctly (i.e., return the appropriate error or warning to the user).

So, @ebachoura, in the next week, please work on finishing off whatever tests you have written so far so that they can be merged and put into the next release.

edlo commented 7 years ago

@kdahlquist Does this mean that I should have all the test that I've written passing by next week? If so, I'm not sure that I will be able to do that

kdahlquist commented 7 years ago

Not necessarily, I'm saying that we should collect all the tests that are passing and giving the appropriate messages to the user and merge them into beta. To get closure by the end of the semester, you should list (maybe on the wiki) the ones that are passing and included in the release, and the ones that still need some work so that we can come back to it later and remember where we left off.

dondi commented 7 years ago

@ebachoura was stuck on how to handle errors reported by the lower-level parseString function and @dondi pointed out a strategy. He will now be able to return an error triggered at this point.

edlo commented 7 years ago

For this issue, I have been working on creating a list of error messages that are returned by the parser and pairing each error with the syntactic error that I inputted to get that error. As I am looking through this list of errors, first, I am noticing that there is a lot to cover, and second, one returned error message can symbolize a few very different things, and it is hard to distinguish which of the errors the parser is complaining about.

I am making this breakthrough too late into the semester and I am not sure that I can finish the testing and error throwing for all of these different cases by Wednesday. @dondi & @kdahlquist I was wondering if it is fine for me to finish this test next semester, and for the sake of the upcoming release, simply make a graphML general error that says that there is something wrong syntactically, just so that this next version of GRNsight won't have it's server crash when a syntactically flawed graphML is inputted. What do you guys think?

kdahlquist commented 7 years ago

Yes, that seems like a reasonable solution to me.

Please capture your work in progress on the types of errors and corresponding messages on a wiki page so that I can review it in the summer and make suggestions, perhaps streamlining the work when you return in the fall.

dondi commented 7 years ago

Agreed that we can start with a general syntax error. We can then refine it afterward.

edlo commented 7 years ago

ALL OF THESE ERRORS ARE BASED ON THIS FILE:

<?xml version="1.0" encoding="UTF-8"?>

1 -2 3 -4

NOT CAUGHT BY PARSE Inputted: key id="weight" for="edge" attr.name="weight" attr.type="double"/> Expected: **this is an issue because it doesn't load it as a weighted graph and just ignores all weights that we defined.

Inputted: <edge target="B">
Expected: <edge source="A" target="B"   >

THIS ONE CRASHES OUR SERVER
Inputted: <node/>
Expected: <node id="A"/>

Error: Unexpected close tag Inputted: Expected:

Error: Invalid attribute name Inputted: <graph id="G" edgedefault="directed" Expected:

Inputted: <data key="weight"3</data>
Expected: <data key="weight">3</data>

Error: Forward-slash in opening tag not followed by > Inputted: key id="weight" for="edge" attr.name="weight" attr.type="double"/> Expected:

Error: Invalid characters in closing tag Inputted: 1</data Expected: 1

Inputted: </edge
Expected: </edge>

Inputted: </graph
Expected: </graph>

Inputted: < id="A"/>
Expected: <node id="A"/>

Error: Unclosed root tag If you leave out the tag completely or any part of it.

I apologize in advance for how difficult this is to read.

dondi commented 7 years ago

@ebachoura The pull request is this close to being accepted, so when you can, just tweak as requested and this can be a go.

kdahlquist commented 7 years ago

As discussed, we are tabling the rest of these tests for later, so I'm pulling "for next release" tag off.

kdahlquist commented 7 years ago

Oops, sorry, put this comment in the wrong place, here it is again.

As noted on issue #428, an empty .graphml file crashes the server. Because @ebachoura is not around this summer, I was wondering if there is a quick fix to adding a test case for this instance since we know that it has a bad outcome.

eileenchoe commented 7 years ago

Empty GraphML file fix live on Beta v2.1.6

kdahlquist commented 7 years ago

Verified that fix by @eileenchoe is in beta. Removing "review requested" tag and lowering the priority level again for when @ebachoura rejoins in Fall.

kdahlquist commented 7 years ago

We are going to take the point of view that from a user perspective GRNsight should return an "error" if the graph is incorrect with respect to the underlying data.

kdahlquist commented 7 years ago

@ebachoura needs to write out his action plan on this issue today before leaving the meeting.

edlo commented 7 years ago

Met with @dondi to discuss and establish the manner by which I would be pairing the parser's error message with it's appropriate GRNsight error/warning. We decided on this style object:

var errorToAppropriateMessage = {
    "parser's error message": graphmlErrors.GRAPHML_GENERAL_SYNTAX_ERROR,
    etc...
};

I setup this object and put it in a separate file with all of the graphML specific warnings and error. I also wrote a function that properly parses all of the information from the parser's error and puts it into an object we can use.

Now I am doing the slightly tedious job of testing different errors manually and documenting the error messages into an excel document and splitting them into three different categories: errors that aren't caught by the parcer and crash GRNsight, errors that are caught by the parcer, and errors that aren't caught by the parcer and don't crash GRNsight and actually just end up showing unexpected graph values.

Once I feel that I have discovered most of the errors, I am going to meet with @dondi again to discuss callback functions, so that I can start working on the code that returns the proper errors.

edlo commented 7 years ago

Documented every error that I could find. @kdahlquist I hear you are really good at finding errors like these, so maybe we could meet sometime to see if you can get the parser to throw any errors messages that I haven't already seen.

@dondi & @kdahlquist We also have a decision to make. A few of the error messages that are returned from the parser aren't really that specific, so like one error message can mean a few different things and some are very skewed from what the error message says. For example, if my file had this code <node id="A" instead of <node id="A"/> , the parser returns the error "Invalid attribute name" which doesn't exactly summarize that error at all.

Good news is they give a line and column number so even if the error isn't that clear or specific, we can at least let the user know that there is an error at that location.

@dondi Also, I'll be passing by your office hours tomorrow as soon as my class is done, 12:30ish, so that you can teach me the callback method that we have been talking about. Also, maybe get some help on identifying the mystery object.

kdahlquist commented 7 years ago

I can help with finding errors, but not until later in the week--I'm swamped until our meeting on Wednesday.

dondi commented 7 years ago

@ebachoura is not blocked by the inscrutable error object so he can proceed, but to contribute to the community he will contact the xml2js author or possibly the author of sax-js (https://github.com/isaacs/sax-js), which appears to be the true source of the err object.

We also cleared up the callback sequence and @dondi gave @ebachoura a sketch of how to structure the code for when a result comes back fals(e?)y.

kdahlquist commented 7 years ago

@ebachoura, are we meeting? I thought we said 12:30.

edlo commented 7 years ago

@kdahlquist and I met on Friday 9/22 and she gave me a list of a few more things to test to make sure that I had all the relevant errors being thrown. With the list that she gave me, I found a few more errors that are thrown and one more syntax error that crashes GRNsight.

Next step: Writing the GRNsight errors that should be thrown for each of the parser errors. Should all be done before next meeting, sorry this week wasn't as productive as hoped.

edlo commented 7 years ago

Alright, so here is how the GraphML error catcher and thrower is setup right now.

start with err from parser >> parse error into object >> pass this object into pairError function >> function pairs the error message from the object to the relevant GRNsight error using the errorMessageToGraphmlError object and then calls that error and pushes it into the error object of network

@dondi It shouldn't take too long, but I need your help recreating the callback changes. It seems the order of everything is a bit off.

dondi commented 7 years ago

@ebachoura is in the process of finalizing error messages caught by the parser as a last step toward finishing this task.

dondi commented 7 years ago

@ebachoura is at the point where some messages are now coming full circle (i.e., they appear in the web app error dialog) and will go through all remaining messages for the remainder of the time.

Given some code review, it was concluded that (for now) it is OK to insert simple HTML tags in the error strings because they get rendered as HTML (including brs for line breaks).

edlo commented 7 years ago

Here is the wonky file: test.txt

edlo commented 7 years ago

So yeah @dondi I figured out the issue with that one file, and yes it turned out to be a dumb mistake, as expected lol. And I think it's time to celebrate because I'm done! Making this review requested @kdahlquist

kdahlquist commented 7 years ago

After this is PR'ed, code reviewed, and merged, we can look toward doing a release.

dondi commented 7 years ago

Issue resolved; release in progress in #548

dondi / GRNsight

Add "syntactic" tests for graphml format #363