git-abhishek / gsoc-2018

0 stars 0 forks source link

Proposal Comments #1

Open ngoldbaum opened 6 years ago

ngoldbaum commented 6 years ago

Hi Abhishek,

First of all, this is a really great proposal as-is. I'm looking forward to hopefully being able to work with you this summer.

I'm pinging @xarthisius and @colinmarc to look over this proposal before the GSOC submission deadline (March 27) if they have time.

I don't really have any line-by-line critiques so please see below for my more nebulous comments :)

Cython Coverage

Have you thought at all about coverage for yt's cython code? cython does have support for python coverage: http://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html#enabling-coverage-analysis

I know that Kacper at one point tried to add coverage support and gave up when he realized that turning on coverage for our cython code made the codebase much slower. This means that you will probably not be able to turn on cython coverage in our main test suite (unless things on the cython side have improved since Kacper tested this) and you may need to consider turning it on only for a subset of tests that are explicitly written to test cython coverage.

Current coverage

I think your current 26% estimate is more of a lower bound. For example, you're currently testing coverage for __init__.py and api.py files which for the most part are only there to define yt's API and aren't really part of the code we'd care about for testing. In addition, you're testing vendored code, which should probably be ignored.

I'd take a look at our flake8 exclude rule:

https://github.com/yt-project/yt/blob/2a0e17ad25353790ea6d707c4e56021612d63ca3/setup.cfg#L10-L16

This pretty exhaustively lists the modules we don't care about for linting, coverage, and testing purposes.

I think you're only using a subset of the unit tests to generate coverage reports as well, there are a number of unit tests that are only triggered if you have test data downloaded or optional dependencies installed. Also including the answer tests would probably also increase coverage. Ideally one thing you will do in this project is reduce the number of tests that require sample data to run as well as reduce the number of tests that require answer tests so more code is tested purely by the unit tests.

That said, as detailed above, the coverage estimate is likely missing all of our cython code, so the real coverage might actually be below 26% if we account for cython code.

Test runtime and test performance

Perceptual hashes seem cool! First I've ever heard of it. We could also think about just including the "correct" answer images in the repo and comparing with them directly as matplotlib does and then getting rid of all the image and plot answer tests.

Does running the tests with python coverage turned on appreciably slow things down? How about with cython coverage turned on?

I think the "Decrease test time proportionally" phase 2 goal is really something that should be happening as you are working through the rest of the codebase. More realistically, you will be identifying tests that are currently slower than some threshold, or that seem to needlessly require the use of a test dataset to run. More than likely slow tests and tests that need big datasets to run can be rewritten to avoid that requirement.

Analysis modules

We are currently planning to remove the yt.analysis_modules submodule in favor of affiliated packages outside of the yt package itself. The justification for this decision in YTEP-0029 and in this yt-dev mailing list thread. Basically, historically we have not been able to support code in yt.analysis_modules at the same level as the rest of the package because it ends up being very domain-specific code that few people know much about.

Since much of the code in yt.analysis_modules will be removed from the main yt package, you should not spend time working on it and should instead only work on the code that will remain in yt after yt 4.0 is released.

More detail about expanding testing for different geometries and data styles

It would be nice to get some high-level detail about how you'd like to expand coverage for different data styles.

One way to do this would be to expand on our existing test functions, fake_random_ds, fake_amr_ds, and fake_particle_ds, which return a uniform resolution grid, an AMR dataset, and a particle dataset, respectively, filled with "fake" randomly generated data.

Maybe you could define a fake_test_datasets generator that test functions could iterate over to repeatedly test the same functionality and syntax but using test datasets with different underlying data styles or geometries. Perhaps this could be a parametrized test fixture? Ideally there would be a way we could explicitly mark tests that are known to fail for certain geometries or data styles and thus have a better idea of what functionality works for which data styles and also a way to figure out which things to focus on to improve support for those data styles and geometries.

Of course we want to avoid combinatorial explosion of tests, especially for tests that might be slow, so we'll need to be careful to avoid having too many permutations.

colinmarc commented 6 years ago

This is a really thorough proposal! I'd agree with the idea of working on test runtime as you work on the other objectives, and I might even go a bit further: it could be high-leverage to focus on runtime before you do anything else. You'll be running the tests over and over for the other deliverables, so if you make the tests run fast first, that'll make the rest of the summer a bit less exhausting.

Glancing at the latest travis run, I noticed some low-hanging fruit right off the bat:

Looking forward to working with you! And hey, UMass! I used to deliver pizza for Athena's, by the Stadium.