Develop a functional testing suite and leverage the existing tests in bedtools

arq5x commented 3 years ago

Could we leverage the existing test suite and replace the CLI with the grass equivalent?

Opening for discussion with @38 and @ArtRand

ArtRand commented 3 years ago

@arq5x sorry for the delay, I was OOO last week and just finished catching up. Using the existing test cases from bedtools makes a lot of sense. However, from looking at the repo, it doesn't look like grass is intended to be a CLI tool (@38 correct me if I'm wrong). Further, the "rust way" to test CLIs is to use something like assert_cli. That being said, if you could shim the existing test suite in, it would probably reduce the human error the most.

I'll take a swing at it, happy to discuss trade-offs further.

38 commented 3 years ago

Hi Arthur,

Thanks for looking into this. Let me give more details about the background.

This project is a very early stage of the query language project. It's indeed not a CLI in traditional definitions. Thus testing it could be very different from testing normal CLI.

This project aimed to address two pain point we seen from developing bedtools: 1. Bedtools is vey complicated but still not very flexible and 2. Dynamic dispatch hurts performance so much. Let me focus on the second pain point, bedtools is a traditional CLI tool, which means all the binary to handling inputs are hard coded already into the binary file. Thus, it requires a lot of dynamic dispatch to reduce the code redundancy, however, this also gives us a huge performance penalty. So the idea for improvement is: Can we design a tool that automatically generates statically dispatched, problem specific binary and let the compiler take care about the code optimization? That's why we are currently focused on code generation. One of the key idea is when we process a large data set, the performance penalty from dynamic dispatch is way higher than the time used to compile an optimized binary on the fly.

As you can see, grass has a simple DSL which compiles into Rust code already. Unfortunately, we don't have a proper CLI interface binary today (currently only have a shell script that simulates the CLI naively).

Please let me know if you have any problems.

Hao

On Thu, Aug 12, 2021 at 7:53 AM Arthur Rand @.***> wrote:

@arq5x https://github.com/arq5x sorry for the delay, I was OOO last week and just finished catching up. Using the existing test cases from bedtools https://github.com/arq5x/bedtools2/blob/master/test/intersect/test-intersect.sh makes a lot of sense. However, from looking at the repo, it doesn't look like grass is intended to be a CLI tool @.*** https://github.com/38 correct me if I'm wrong). Further, the "rust way" to test CLIs is to use something like assert_cli https://docs.rs/assert_cli/0.6.3/assert_cli/. That being said, if you could shim the existing test suite in, it would probably reduce the human error the most.

I'll take a swing at it, happy to discuss trade-offs further.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/38/grass-demo/issues/9#issuecomment-897658404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXCF7VDXVXJPHTQMIZDOGTT4PG35ANCNFSM5BUIS2MA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

arq5x commented 3 years ago

I wonder if the shell script CLI could be used as a surrogate for bedtools when porting the testing infrastructure from bedtools?

ArtRand commented 3 years ago

@38 Understood. I bet that we can use the Bedtools suite as a correctness test. The DSL will probably need additional tests for specific functionality and I suspect there will also be some tests rendered unnecessary because the type checker will catch them for you. Personally (and not to be critical at all), the shell script-based tests look difficult to maintain, but maybe it's not that hard.

I'll open a PR once I have a scaffold of an idea and we can evaluate it.

ArtRand commented 3 years ago

@38 I finally had some time to make my way through the code. This morning, I noticed you added an implementation of invert to master. While working through the intersect tests, as you may imagine, I came across the need for this functionality pretty quickly. I was planning on adding something similar as a module under intersect - but I noticed you added it to the high_level_api. I'm just curious how you envision new functionality evolving. Maybe pull the invert functionality into the intersect module? I'm also happy to rebase and incorporate the new changes.

arq5x commented 3 years ago

Hi @ArtRand, @38 and I spoke about this at our meeting this morning. The invert() function is analogous bedtools complement. Are you looking for the equivalent of the -v option in bedtools intersect? @38 I wonder if calling your current invert() function complement() instead would be better and more familiar to the large number of existing bedtools users?

ArtRand commented 3 years ago

@arq5x your suspicion is correct, I was looking for something like the -v option in for intersect. Obviously (to me now), this is different than invert/complement. I guess my misinformed question distills to what belongs in the high_level_api module as opposed to the algorithms module, but perhaps the scope of that question is larger than this particular issue.

38 / grass-demo

Develop a functional testing suite and leverage the existing tests in bedtools #9