Closed asfimport closed 8 years ago
Wes McKinney / @wesm: Can you explain how you envision regression testing fitting into the development workflow, as compared with functional unit tests (verifying correctness)?
Aliaksei Sandryhaila / @asandryh: Since we do not have writing functionality yet, the first iteration of tests will use pre-generated parquet files in /data/ (existing and new files).
Aliaksei Sandryhaila / @asandryh: In our case, regression testing will consist of running all functional unit tests on each modification. This will ensure that we do not mess up the already implemented, presumably correct functionality.
Wes McKinney / @wesm:
How is this different from just running the test suite with ctest
? That is already part of the Travis CI build script.
Aliaksei Sandryhaila / @asandryh: Ah, I missed that you've already added it in .travis.yml a few days ago.
Wes McKinney / @wesm: This JIRA does not have a well defined scope. Almost all patches need to be accompanied by unit tests – the problem right now is that we need a way to generate test data using parquet-mr (or some other tool) so that tests can be written right now for reader functionality until parquet-cpp has write capability. Another option is that we can mock out details of the file format (e.g. data and dictionary pages) and write tests that way (starting first with testing the value encoders and decoders so we know we can generate data pages in memory).
Aliaksei Sandryhaila / @asandryh: So far the jira is a bit vague because its first objective is to discuss and decide on the testing setup. :) Just to be clear: by "generate test data using parquet-mr," do you mean to do this offline and add files to the repository, e.g. to /data directory?
Wes McKinney / @wesm: I definitely don't want to bloat the git repo. So if we go that route, either we would host test data files outside of the main git repo or have a data generation script that creates them from scratch locally. parquet-mr probably never had to face this issue because it was the proverbial chicken.
My preference would be to focus on testing round-tripping data from the ground up, but I also need to be able to write Parquet files =) It might be useful to have some "smoke tests" that use external pre-generated data files but it doesn't feel like a scalable solution (e.g. bug fixes may require generating the right file to reproduce a bug).
Aliaksei Sandryhaila / @asandryh: IMHO, it's not a big issue to add a few parquet files to /data for the time being. As soon as we can write, we'll remove these files and update the corresponding tests.
Wes McKinney / @wesm: This is fine with me, as long as we don't exceed a few megabytes. My priority will definitely be to have test fixtures ASAP that enable data to be round-tripped to in-memory buffers without having to assemble a fully formed file – for the purposes of verifying column reading you only need to be able to generate the different encoded page types.
Wes McKinney / @wesm: I thought some more about this, and I'm not supportive of checking in more test data files until we've improved our ability to unit test the existing code (http://martinfowler.com/bliki/TestPyramid.html). Let's take the discussion to the mailing list thread about this, and as we identify well-defined tasks to improve the test infrastructure we can create new JIRAs.
Aliaksei Sandryhaila / @asandryh: This is not an issue, but rather a discussion on functional and intergration tests. It has been moved to https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#.
We need to add a testing framework for unit tests, and run it as a part of each Travis CI build.
Reporter: Aliaksei Sandryhaila / @asandryh Assignee: Aliaksei Sandryhaila / @asandryh
Note: This issue was originally created as PARQUET-479. Please see the migration documentation for further details.