TriBITSPub / TriBITS

TriBITS: Tribal Build, Integrate, and Test System,
http://tribits.org
Other
36 stars 47 forks source link

CDash Report random test failure tool (#600) #603

Closed achauphan closed 5 months ago

achauphan commented 5 months ago

WIP: This PR is the implementation of a CDash Random Test Failure analysis tool related to:

achauphan commented 5 months ago

Force push to fix typo in d075467 commit message

bartlettroscoe commented 5 months ago

Commenting on @sebrowne review above, you need system-level tests for every major use case supported by the tool, just to make sure the command-line arguments are interpreted correctly and you get the desired behavior. But most of the detailed code coverage tests should be below system-level tests (i.e. unit tests). You need to be sparing with system-level tests because they are typically expensive and/or difficult to maintain compared to lower-level (unit-level) tests.

Updated: @sebrowne and @achauphan, do you want me to do a detailed code review of this PR? Just taking a quick look, there is really not that much production code but it is not very unit testable the way it is currently written. The entire file tribits/ci_support/cdash_analyze_and_report_random_failures.py is only 342 lines long, but it is mostly a small class and some small functions and then a large main() function, so it would be difficult unit test this. If it were me, I would break that long main() function up into small functions and/or classes and have unit tests on the pieces. Those unit tests are much easier to write and maintain and might allow fewer system-level tests. In fact, if you develop the code with TDD, the unit tests come for free and you have naturally modular code. I can suggest how to refactor this code and make it more modular and unit testable if there is interest in that. Just let me know.

achauphan commented 5 months ago

@bartlettroscoe I do agree that main() needs to broken down into smaller units and I would love and appreciate a more detailed review, however, I think this is @sebrowne's call for if I should put more time into this as I've already taken too long on the story.

From a "lets get some data now on the problem" perspective, could it be worth a general review and standing it up for Trilinos to get a better picture of the frequency of random test failures and then assess how much more work should go into this?

bartlettroscoe commented 5 months ago

@achauphan and @sebrowne,

I may be able to use some other funding related to DevOps research and metrics gathering to refactor this code some and generalize it so that it can work for projects other than Trilinos. As currently written, it will only work for Trilinos PR builds. Therefore, I think if we keep the code as is, it may be better (for now) to keep the tool and its tests on a branch or put this code into its own git repo.

With that said, I don't think it is a lot of work to do the minimal refactorings needed to make 95% of this generic and keep it in the TriBITS repo and then have a Trilinos-specific driver in a different git repo (could be the main Trilinos repo or a different git repo).

What do you think?

achauphan commented 5 months ago

@bartlettroscoe I can't speak for @sebrowne, but do agree that in its current state, this is not at all ready for general use so it should stay as a branch for now. If it's a good idea for me to keep working on this I can, but if you would like to and are able to take this on further, please feel free to do so. I do also agree that this should not too much to refactor and make it more generic.

sebrowne commented 5 months ago

@achauphan @bartlettroscoe

Looking over it again for the pieces that are Trilinos-specific, I do not feel they are significant enough to bother generalizing the code until there is a greater use case for doing so, primarily given the relative ease with which it should be possible to separate the general and the specific components. That being said, I think that the argument that splitting it is simple enough to just take care of now is also perfectly valid. I don't have a strong opinion on this matter, as long as this code is in Trilinos and usable very soon. I want to be able to start setting up a Jenkins job with it, and if it needs to target the branch in this repository to do so, that's fine too. Our sprint ends on Monday 02/19 and if this capability is not working and emailing by then, I do not feel that I can justify spending more of the team's time on it.

On the subject of main() splitting, I do not see the value if the cache-mock mechanism will still be used (because unit testing for the code that is currently in main() is prohibited much more by the lack of library-function mocking than it is by all being in one function).

achauphan commented 5 months ago

@sebrowne @bartlettroscoe

Is there a reasonable middle ground here for supporting cache-mock tests along with library-function mocking?

The argument to utilizing a cache-mock system test would be, as stated earlier by @bartlettroscoe, to test this program's command line args interpretation and check the overall desired behavior of the program at the cost of having to also run the library code as mentioned by @sebrowne.

Then utilizing library-function mocking would allowmain() to be broken down to its own smaller testable units, as mentioned by @sebrowne, which we all seem to think is a needed.

I believe this approach would promote another module file with something of the name of CDashRandomFailures.py that holds smaller units of main() and can further house the generic strategy objects mentioned here.

Coming into this, my understanding of testing has been limited, but I think this conversation has been very helpful in shaping the different perceptions around test.

What do ya'll think?

(Also @bartlettroscoe I would appreciate the refactor suggestions offer you made earlier if you have time)

bartlettroscoe commented 5 months ago

Is there a reasonable middle ground here for supporting cache-mock tests along with library-function mocking?

@achauphan, what you outline above is exactly what I would recommend and it is what I did for the tool cdash_analyze_and_report.py. You will see there are a lot of unit tests for code called by that tool that typically never create files or touch the filesystem.

Also @bartlettroscoe I would appreciate the refactor suggestions offer you made earlier if you have time.

Yes, I can suggest what to do. But it sounds like @sebrowne wants to start with the minimal work to get this merged and used by Trilinos. For a minimal and easy refactoring (that does not require writing any new tests), what I would suggest would be:

That is a very straightforward refactoring that can be done in stages, using the existing tests to ensure that each refactoring step goes smoothly and with no risk.

What is outlined above is exactly what was done for:

The above refactoring can be done in just a few short hours (or less). We can pair-program this together if you like. I have some availability tomorrow. (My SNL Outlook calendar is currently shown to be full but that is just blocking off time for some writing I need to get done for a paper.)

achauphan commented 5 months ago

@bartlettroscoe thanks for the suggestions! I've started a bit on these yesterday before your comment and planned to do a stacked PR. How about I finish going through these suggestions and we review the stacked PR together?

bartlettroscoe commented 5 months ago

@bartlettroscoe thanks for the suggestions! I've started a bit on these yesterday before your comment and planned to do a stacked PR. How about I finish going through these suggestions and we review the stacked PR together?

@achauphan, sounds good. Let me know when you want me to review.

achauphan commented 5 months ago

Force push to fix codespell

achauphan commented 5 months ago

@sebrowne @bartlettroscoe Requesting another review.

The primary changes done since the last request was moving away from the script file cdash_analyze_and_report_random_failures.py and into using a module file CDashAnalyzeReportRandomFailures.py with a driver class CDashAnalyzeReportRandomFailuresDriver containing the same contents of the original script. Running the script will be done from a project specific script based on example_cdash_analyze_and_report_random_failures.py. The example script currently contains placeholder strategy classes that are the ones that would be used for Trilinos.

These are the changes that I hope are enough to stand up. I did not make any changes related to breaking up main(), or in this case runDriver(), into smaller testable bits as it was previously deemed good enough for now. That can perhaps be scheduled later down the road.

bartlettroscoe commented 5 months ago

@achauphan, doing a detailed review now ...

bartlettroscoe commented 5 months ago

@achauphan, while I am doing this review, could you please rebase this branch on top of the upstream 'master' target branch? GitHub is reporting the branch is out of date showing:

image

?

If not, I can take care of that after my reveiw.

NOTE: This is GitHub being super careful testing the exact version of the repo that will be merged

bartlettroscoe commented 5 months ago

@achauphan, I am okay to merge as is. Let me know and I will hit the merge button.

achauphan commented 5 months ago

@bartlettroscoe thanks for the review and suggestions! Hopefully there is time in the future to implement them.

Please go ahead and merge :)

If you'd like, I can give snapshotting this into Trilinos a try tomorrow, assuming that this is the correct guide.

bartlettroscoe commented 5 months ago

If you'd like, I can give snapshotting this into Trilinos a try tomorrow, assuming that this is the correct guide.

Close, but we use a separate branch to snapshot into and then and then merge that into the Trilinos 'develop' branch so as to not copy over local changes to TriBITS Trilinos. I will do it this time right now and will write instructions later for next time.