AML14 / tratto

2 stars 1 forks source link

PART 1: Refactor dataset oracles java (appetizer) #41

Closed ezackr closed 1 year ago

ezackr commented 1 year ago

This pull request is composed of the main support functionality used to implement the OraclesDataset. It serves as a pre-cursor to the upcoming larger pull request. Please do not look at the other pull request before this one. There will be overlap between the pull requests, however, any redundant files may be ignored in the larger pull request.

AML14 commented 1 year ago

General comment: the test coverage is going down from over 90% to about 70%. I don't think it makes sense to start writing tests now since we are on a tight schedule. Also, as I've said in the past, as long as we can generate the oracles dataset, the tokens dataset, and reconstruct the oracles based on the tokens dataset, we should be good to go (that's the acid test).

In the future, try to accompany each piece of functionality with some kind of test (may it be unit, integration or E2E).

mernst commented 1 year ago

I am strongly in favor of setting up CI. It's best for you to do that since this repository is in your account. I can provide advice/opinions if you want it; just ask by email.

I agree that we don't need unit tests for all functionality, but covering the functionality in an end-to-end test is valuable if possible. My rule of thumb is to write a unit test for anything that feels like a library in that it has a simple-to-specify self-contained interface, so it could be reused in multiple places or even outside the project.

ezackr commented 1 year ago

I added some tests for JavaParserUtils functionality. I will also be adding a few tests for DatasetUtils in the other pull request (aside from the end-to-end Oracles Dataset generation test).

General comment: the test coverage is going down from over 90% to about 70%. I don't think it makes sense to start writing tests now since we are on a tight schedule. Also, as I've said in the past, as long as we can generate the oracles dataset, the tokens dataset, and reconstruct the oracles based on the tokens dataset, we should be good to go (that's the acid test).

In the future, try to accompany each piece of functionality with some kind of test (may it be unit, integration or E2E).

AML14 commented 1 year ago

Thanks a lot! This PR looks good. I'll let you know when I set up CI so that you can merge that change into this branch, have CI pass, and then @mernst can review this PR.

AML14 commented 1 year ago

@mernst @ezackr This branch is undergoing some relatively major changes and refactorings again. The problem is that the most important E2E test (generating oracles dataset, then tokens dataset, then reconstructing oracles based on tokens) is not set up in this branch because it lacks some necessary functionality (e.g., the OraclesDataset main()). Usually, what I do to make up for this is to periodically merge refactor-dataset-oracles-java into the most advanced branch where I'm working, i.e., tratto-program, and then make CI pass. However, refactor-dataset-oracles-java is currently failing, and it seems to be due to a regression (there's a specific test failing). Please, make sure that this and that branch are passing CI, then I will merge all changes into my branch, resolve conflicts if necessary, and make sure that no regressions have been introduced that make the E2E tests fail.

If you have suggestions regarding how to handle this (making sure that the changes introduced in these branches do not affect the E2E tests from the most recent branches), let me know.