PART 2: Refactor dataset utils (DatasetUtils)

ezackr commented 1 year ago

NOTE: this is a continuation of the previous pull request. Please see the previous request before viewing this request.

Files added in this pull request:

star/tratto/data/oracles/JDoctorCondition.java
star/tratto/util/javaparser/DatasetUtils.java

JDoctorCondition.java provides useful enums for storing information used in DatasetUtils.java. DatasetUtils.java is the bulk of this request, involving functionality usable in both the construction of the oracles dataset and the main program. Methods are relatively bite-sized for re-purposing (and hopefully an easier review :')). Please let me know if any private methods should be made public for other parts of the library.

AML14 commented 1 year ago

Regarding test coverage: Same comment as in previous PR. The coverage is now down to 59%.

AML14 commented 1 year ago

I see that this PR still does not include some functionality for generating the oracles dataset (e.g., the class OraclesDataset is empty) nor for the final program (e.g., the OracleDatapointBuilder). I guess this is to make it easy to review different PRs focused on different functionality.

Anyways, I would also like to review that functionality. You may include it in this PR or a new one. Right now I'm a bit confused about the relation between this branch and refactor-dataset-oracles-java, since I noticed that they have deviated from each other (i.e., they have different changes applied). Will we have any problem when merging all the functionality into the main branch?

AML14 commented 1 year ago

I took the liberty to review the OracleDatapointBuilder from the refactor-dataset-oracles-java branch. It's probably good enough for our purpose, but I also find easier to read the syntax used in an example like this. Basically, the withXXX() methods modify the internal OracleDatapoint and return the same Builder instance, so that these calls can be chained as much as necessary.

AML14 commented 1 year ago

I continued reviewing the refactor-dataset-oracles-java branch on my local machine, to speed up the final merging process. While using Java 12-17 features is nice, we need to make sure that those don't cause any breaking changes. In particular, and as previously mentioned, the .toList() method callable upon a Stream (available since Java >=16) returns an immutable list, as opposed to the traditional .collect(Collectors.toList()), which returns a mutable list. See reference here. This is causing problems in many parts of the code where the latter option has been replaced with the former. The IDE itself may show warnings related to this, for instance:

Multiple tests are failing and the TokensDataset main program itself crashes.

My suggestion would be to replace all the .toList() calls with .collect(Collectors.toList()), just to be on the safe side.

AML14 commented 1 year ago

Related to the refactor-dataset-oracles-java branch and the generation of the oracles dataset:

There's a couple of tests failing in my local machine, namely, PathTest#testPath() and JDoctorConditionParserTest#testParseJDocConditions(). The latter is because the folders containing the actual source code of the Jdoctor projects are missing. Please coordinate with @darthdaver to upload the source to the repository. The former is failing in the following way:

Let me know when the source of the Jdoctor projects is uploaded. I want to try generating the oracles dataset and the tokens dataset based on it, to see if, after the refactoring, all oracles can be reconstructed based on the tokens dataset properly.

AML14 commented 1 year ago

@ezackr Let me know when you have addressed the remaining change requests, especially those related to the oracles dataset functionality. I need the Jdoctor projects source and the OraclesDataset main() in order to set up CI with the super-hyper-extra-ridiculously-large E2E test.

Also, as you did with the other PR, please include a response to each of my comments answering how you addressed the changes.

ezackr commented 1 year ago

I see that this PR still does not include some functionality for generating the oracles dataset (e.g., the class OraclesDataset is empty) nor for the final program (e.g., the OracleDatapointBuilder). I guess this is to make it easy to review different PRs focused on different functionality.

Anyways, I would also like to review that functionality. You may include it in this PR or a new one. Right now I'm a bit confused about the relation between this branch and refactor-dataset-oracles-java, since I noticed that they have deviated from each other (i.e., they have different changes applied). Will we have any problem when merging all the functionality into the main branch?

I will be adding a final PR request with all functionality by the end of the day for any remaining functionality. We may want to discuss a few cases I am concerned about in our meeting tomorrow, otherwise, everything is fully functional. I just figured DatasetUtils.java was a large enough file on its own.

ezackr commented 1 year ago

I took the liberty to review the OracleDatapointBuilder from the refactor-dataset-oracles-java branch. It's probably good enough for our purpose, but I also find easier to read the syntax used in an example like this. Basically, the withXXX() methods modify the internal OracleDatapoint and return the same Builder instance, so that these calls can be chained as much as necessary.

At the moment, this may be slightly difficult with the necessary exception handling, however, I can look into this further if we have more time.

ezackr commented 1 year ago

nice, we need to make sure that those don't cause any breaking changes. In particular, and as previously mentioned, the .toList() method callable upon a Stream (available since Java >=16) returns an immutable list, as opposed to the traditional .collect(Collectors.toList()), which returns a mutable list. See reference here. This is causing problems in many

Sounds good. I removed all calls using toList() and replaced them all with collect(Collectors.toList()).

ezackr commented 1 year ago

Related to the refactor-dataset-oracles-java branch and the generation of the oracles dataset:

There's a couple of tests failing in my local machine, namely, PathTest#testPath() and JDoctorConditionParserTest#testParseJDocConditions(). The latter is because the folders containing the actual source code of the Jdoctor projects are missing. Please coordinate with @darthdaver to upload the source to the repository. The former is failing in the following way:

Let me know when the source of the Jdoctor projects is uploaded. I want to try generating the oracles dataset and the tokens dataset based on it, to see if, after the refactoring, all oracles can be reconstructed based on the tokens dataset properly.

I fixed all the pathing necessary for refactoring. All tests are passing now. I'll coordinate to have the source repositories uploaded.

AML14 commented 1 year ago

I took the liberty to review the OracleDatapointBuilder from the refactor-dataset-oracles-java branch. It's probably good enough for our purpose, but I also find easier to read the syntax used in an example like this. Basically, the withXXX() methods modify the internal OracleDatapoint and return the same Builder instance, so that these calls can be chained as much as necessary.

At the moment, this may be slightly difficult with the necessary exception handling, however, I can look into this further if we have more time.

It's not urgent and maybe not even necessary, so leave it for now if you want.

AML14 / tratto

PART 2: Refactor dataset utils (DatasetUtils) #42