Better testing infrastructure for slow-odgi

anshumanmohan commented 1 year ago

Two known sources of clunkiness:

The .out files are generated using a shell script and not turnt because I want an odgi command (odgi depth or odgi degree) as the oracle and then want to run my python version against that oracle.
The test directories have copies of the GFAs, which is silly. For some reason I'm not able to run turnt path_to_the_gfas/*.gfa; I need to copy the GFAs into the same directory as the turnt.toml file and then runturnt *.gfa`.

I'd super appreciate tips on one or both!

anshumanmohan commented 1 year ago

Testing crush_n properly reveals a couple interesting things!

I used odgi crush as my oracle and then odgi view to get a GFA at the end. It seems that odgi will reorder H/S/L/P lines as it sees fit during its operations. Further motivation for a normalization pass? To have any chance of turnt-ing, I do exactly such a pass in my gen_outs.sh script.
A few tests fail because ODGI seems to replace CIGAR string with a * while we keep it verbatim. In such cases, I have kept our output as filename.output and odgi's as filename.out. I'll ponder some more.

sampsyo commented 1 year ago

We chatted a bit about Turnt's environment support today! And I linked to it elsewhere; it's possible I should have mentioned that here instead.

Here is the deal with copying test inputs to other directories. The way Turnt works is that it looks for turnt.toml by walking "upward" from where the test file itself lives. That means that Turnt works the same way regardless of where you are "standing": running turnt foo.t and turnt ../../../foo.t and turnt bar/baz/qux/foo.t from different working directories (assuming those are all different paths to the same file) are all guaranteed to do the same thing. That means that the way you associate a turnt.toml with your tests is to put them in the same directory together (or turnt.toml can go in a parent directory).

I mentioned elsewhere that the solution to the duplication is to just put your new turnt.toml directly in test/. Then it will be shared by the files that are already there. And for testing different commands, you use different environments.

anshumanmohan commented 1 year ago

We no longer keep large graphs around in the repo. We fetch graphs from the web, and then use turnt environments to test these graphs against various algorithms. Run make test-slow-odgi to test crush, degree, depth, and emit.

chop is WIP; watch this space
flip has two diffs, one interesting and one silly.
- The interesting one is that, in LPA.gfa, L 2712 + 1546 + 0M gets changed by odgi flip to L 1546 - 2712 - 0M with no other change in the graph. We do not match this in our Python code because I don't understand it. I will ask on Matrix.
- The silly one is that, in note5.gfa, we need to further hammer down the normalization of the GFAs we emit. I was proceeding with a light touch so far, just sorting using a (from, to) tuple, but this example shows that I need to enforce the order between, say, L 3 + 4 + 0M and L 3 - 4 + 0M.

anshumanmohan commented 1 year ago

I've asked on Matrix re: the interesting case, and my most recent commit patches the silly case!

anshumanmohan commented 1 year ago

The headers, segments, and paths are being generated correctly when running chop. As we discussed at the previous meeting, I have dropped links entirely when it comes to this algorithm. To turnt-test this, I have added a flag to emit that lets us disable links in the generated GFA. So far it is only being used for chop.

anshumanmohan commented 1 year ago

Merging this in for now. The Makefile target test-slow-odgi tests the algorithms chop, crush, degree, depth, and our baseline emit. The turnt output is noisy, since the .out files of each algorithm clobber those of the previous. Look for the results of {algorithm}_test, not {algorithm}_oracle.

Once the ODGI folks have a response re: flip, I'll make a separate PR to add that into the test target after making any fixes.

cucapra / pollen

Better testing infrastructure for slow-odgi #27