FluidityProject / fluidity

Fluidity
http://fluidity-project.org
Other
365 stars 115 forks source link

Updated unittestharness to generate junit reports. #287

Closed gnikit closed 3 years ago

gnikit commented 3 years ago

Fixes: Jenkins unit test step not producing fail report #285

Detailed Changes:


Comments on Changes:


gnikit commented 3 years ago

Something I think that will substantially improve the junit output, but might be a bit controversial due to its implementation is instead of passing to

TestCase('test', 'unittest.test_name', ...)

we would do

TestCase('test', 'unittest.root_test_dir.test_name', ...)

So for an error metric test this would be TestCase('test', 'unittest.error_measures.test_name', ...)

This would make the junit report much for informative since it would be easier to distinguish and locate failing tests. However, it requires knowing the directory where a test is stored, which currently is not possible given that the unittestharness runs all the binaries from /bin/tests. The only scalable solution to this that I could think off is for the unittestharness to traverse a list of directories (the locations of the unit tests) and extract from the relative path the root_test_dir. However this would mean that tests are not run from bin/ anymore, which I guess some people might not like.

gnikit commented 3 years ago

So I had a though about it a bit more (this time properly) and it turns out we can get the base directory that a unittest belongs to either by following its symlink or if for some reason someone is running the unittestharness from a weird location, e.g. error_measures/tests/ or even outside Fluidity by using all but the /tests/ part of the path. Obviously the second approach is not super robust, but it is only there as a fallback in case someone wants to run a couple of tests from weird directories.

FYI my previously suggested approach would not have worked since it would be quite difficult to distinguish between enabled and disabled tests or files like test_main.cpp without replicating what is already in the Makefiles to the unittestharness.

Patol75 commented 3 years ago

Hi @gnikit and @stephankramer, Really nice initiative to make sure unittestharness provides a useful report that can be handled by Jenkins, it will definitely be a great resource. If I may, I just wanted to propose a re-writing of the Python script which I found a bit complicated, mainly because of the presence of the two Classes I guess. The suggestion re-write can be checked at f3731a9. I think I have included most changes mentioned in the discussion so far. Additionally, the xml file also includes error messages - through add_error_info - resulting from errors/warnings during subprocess execution and log messages - through add_skipped_info - generated by individual tests. Feel free to use it if it looks relevant to you.

gnikit commented 3 years ago

@Patol75 this is so much better than my simplistic solution. I think your approach is much more complete, especially capturing sterr and stdout from Fluidity itself and not just the unittest output. I am also a big fan of your use of pathlib, which substantially simplifies a couple of spots. Again, very nice work. This would be a much needed improvement for the unittestharness. Only bit that worries me would be rmtree(testsDir), which can basically delete any dir. Other than that I think it is really good, if @stephankramer agrees we can add it in this PR.

stephankramer commented 3 years ago

Looks great @Patol75 . So let's integrate that and see where we are...

Patol75 commented 3 years ago

Jenkins returned a failure with regard to the error_type argument I supply to add_error_info. I am slightly confused because the keyword does exist - see here. Any idea?

angus-g commented 3 years ago

Old version of junit-xml on Jenkins? (though that keyword was added 4 years ago, so it'd have to be pretty old!)

Patol75 commented 3 years ago

@tmbgreaves May I ask you to have a look if this is the case?

EDIT: Or potentially someone else?

Patol75 commented 3 years ago

Any update on this, anyone?

gnikit commented 3 years ago

So @angus-g was right. junit-xml on the docker image being used is quite old version, 1.7 to be exact. Updating and junit-xml to 1.9 fixed the problem and resulted in the tests summary:

RESULTS
        Passes: 18031

        Warnings: 2
                ['test_1d_pickers']

        Failures: 133
                ['test_quadrature', 'test_invert']

        Errors: 4
                ['test_quad_supermesh', 'test_pseudo_supermesh', 'test_node_lies_on_boundary', 'test_unify_meshes']

As a side note, it might be worth adding in the next rebuilt of the docker images to pip to make situations like this a bit easier to resolve.

Patol75 commented 3 years ago

Thanks @gnikit

On my side, from a local build, I see this:

RESULTS
    Passes: 18034

    Warnings: 2
        ['test_1d_pickers']

    Failures: 137
        ['test_vertical_prolongation_operator', 'test_quadrature', 'test_vertical_integration', 'test_vertical_extrapolation']

    Errors: 0
        []

About test_1d_pickers, I get the following output:

thomas@tristan:~/Softwares/fluidity$ tools/unittestharness.py 
    test_1d_pickers: Running
        Pass: [Picker pointer allocated]
        Pass: [No picker attached]
        Pass: [Point not contained]
        Warn: [Picker cached]; error: Picker not cached
        Warn: [Point contained]; error: Reported incorrect containing element
        Pass: [No references]
RESULTS
    Passes: 4

    Warnings: 2
        ['test_1d_pickers']

    Failures: 0
        []

    Errors: 0
        []

I am correctly picking these two as Warnings, but within the xml file, the following gets written:

                <testcase classname="unittest.femtools.test_1d_pickers" name="Picker pointer allocated" time="0.001841"/>
        <testcase classname="unittest.femtools.test_1d_pickers" name="No picker attached" time="0.001841"/>
        <testcase classname="unittest.femtools.test_1d_pickers" name="Point not contained" time="0.001841"/>
        <testcase classname="unittest.femtools.test_1d_pickers" name="Picker cached" time="0.001841">
            <failure message="Picker not cached" type="failure"/>
        </testcase>
        <testcase classname="unittest.femtools.test_1d_pickers" name="Point contained" time="0.001841">
            <failure message="Reported incorrect containing element" type="failure"/>
        </testcase>
        <testcase classname="unittest.femtools.test_1d_pickers" name="No references" time="0.001841"/>

Both messages are printed as failure messages, which I would say is fine, but I guess the type should be warning, right? This is not difficult to implement, but my question here is: are all Warn messages reported as ; error:, meaning similar to Fail messages? I am asking because, if that is not the case, funky stuff could happen.

For test_invert, it looks to be passing, but I have seen it fail as well, for example:

test_invert: Running
        Pass: [invert 1]
        Pass: [invert 2]
        Fail: [invert 3]; error: Invert should produce inverses.
        Pass: [invert 4]
        Pass: [invert 5]
gnikit commented 3 years ago

About the extra failures they originate from not finding the Fluidity Python module. In my case, I accidentally reset the PYTHONPATH variable after updating junit-xml. I suspect @Patol75 you might have to edit your pythonpath to point to fluidity/python. With that fixed the docker container produces:

RESULTS
        Passes: 18039

        Warnings: 2
                ['test_1d_pickers']

        Failures: 132
                ['test_quadrature']

        Errors: 0
                []

which I think is what we should be getting. I can also verify that test_invert behaves inconsistently when run locally; possibly rounding off errors?

gnikit commented 3 years ago

The other thing I was thinking it would be useful to have would be the options to:

Patol75 commented 3 years ago

Hey @gnikit,

It is not clear to me why you need to manually set the environment variables to get it to work: isn't the code meant to set them anyway? Additionally, it looks like subprocess should inherit any set variable:

If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

I have not investigated further for test_invert, but glad you noticed the same behaviour.

Your two suggestions make a lot of sense; thanks for that.

I attach a potential re-write, which tries to deal differently with the presence or absence of junit-xml, and also includes a few improvements here and there. Please let me know if you think it is sensible. unittestharness.py.txt

tmbgreaves commented 3 years ago

Apologies for the slow response on this one - I've updated the Docker Bionic base image with the junit.xml package from Groovy which gives v1.9. I've also pulled all packages up to the latest bionic versions and installed python3-pip.

gnikit commented 3 years ago

Apologies for the slow response on this one - I've updated the Docker Bionic base image with the junit.xml package from Groovy which gives v1.9. I've also pulled all packages up to the latest bionic versions and installed python3-pip.

Thanks a lot! I restarted Jenkins, hopefully now everything works.

gnikit commented 3 years ago

Hi @Patol75 ,

Sorry for the late reply, completely forgot about this PR.

It is not clear to me why you need to manually set the environment variables to get it to work: isn't the code meant to set them anyway? Additionally, it looks like subprocess should inherit any set variable:

So I am not sure either. I know from experience that if you run the old unittestharness without appending to your LD_LIBRARY_PATH, PETSC_DIR/PETSC_ARCH unittests that call PETSc directly will fail. I can try removing them and see what happens.

Possibly, with the new unittestharness and the use of subprocess the explicit PATH resolves might not be necessary anymore.

I have not investigated further for test_invert, but glad you noticed the same behaviour.

I think what we are seeing are rounding errors, because the tolerance for the test is 100*epsilon(0.0), which should be ~1e-14. Although, I am not entirely sure.

I attach a potential re-write, which tries to deal differently with the presence or absence of junit-xml, and also includes a few improvements here and there. Please let me know if you think it is sensible. unittestharness.py.txt

Like before, very nice work. My main concern is the code duplication between unitTestHarness() and unitTestHarnessNO(). Could we not achieve the same result by making the xml_output optional and only doing any junit things if the string is not empty? I also think, it would be good to have an exception check for the file name. Other, than those 2 things, it looks great!

Here is a diff with my proposed changes on your unittestharness.py.txt.

34,52c34,37
< def unitTestHarnessNO(tests):
<     testsResults = {'Pass': Counter(), 'Warn': Counter(), 'Fail': Counter()}
<     errorList, skipList = [], []
< 
<     for test in tests:
<         print(f'\t{test.name}: Running')
<         testProc = subprocess.run(
<             './' + test.name, shell=True, cwd=test.parent, encoding='utf-8',
<             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
< 
<         if re.search(r'^\s*error[:\s]', testProc.stderr,
<                      re.IGNORECASE | re.MULTILINE):
<             print(f'ERROR: {test.name} failed')
<             errorList.append(test.name)
<             continue
<         elif ': not found' in testProc.stderr:
<             print(f'WARNING: {test.name} not found')
<             skipList.append(test.name)
<             continue
---
> def unitTestHarness(tests, xml_outfile=''):
>     xmlParser = None
>     if xml_outfile:
>         xmlParser = TestSuite('UnitTestHarness')
54,67d38
<         for testOutput in testProc.stdout.splitlines():
<             try:
<                 testsResults[testOutput[:4]][test.name] += 1
<             except KeyError:
<                 print(testOutput.lstrip())
<                 continue
< 
<             print(f'\t\t{testOutput}')
< 
<     printResults(testsResults, errorList, skipList)
< 
< 
< def unitTestHarness(tests, xml_outfile):
<     xmlParser = TestSuite('UnitTestHarness')
85,88c56,61
<             xmlEntry = TestCase('', classname=testId,
<                                 elapsed_sec=wallTime, status='Error')
<             xmlEntry.add_error_info(message=testProc.stderr)
<             xmlParser.test_cases.append(xmlEntry)
---
> 
>             if xml_outfile:
>                 xmlEntry = TestCase('', classname=testId,
>                                     elapsed_sec=wallTime, status='Error')
>                 xmlEntry.add_error_info(message=testProc.stderr)
>                 xmlParser.test_cases.append(xmlEntry)
93,96c66,70
<             xmlEntry = TestCase('', classname=test.name,
<                                 elapsed_sec=wallTime, status='Skipped')
<             xmlEntry.add_skipped_info(message='Not found')
<             xmlParser.test_cases.append(xmlEntry)
---
>             if xml_outfile:
>                 xmlEntry = TestCase('', classname=test.name,
>                                     elapsed_sec=wallTime, status='Skipped')
>                 xmlEntry.add_skipped_info(message='Not found')
>                 xmlParser.test_cases.append(xmlEntry)
110,137c84,112
<             try:
<                 testMsg = re.search(r'(?<=\[).+(?=\])', testOutput).group()
<             except AttributeError:
<                 testMsg = testOutput[6:].split('; error:')[0]
< 
<             xmlEntry = TestCase(testMsg, classname=testId,
<                                 elapsed_sec=wallTime,
<                                 status=testOutput[:4])
< 
<             try:
<                 failMsg = re.search('(?<=error:).+', testOutput).group()
<                 if not failMsg.lstrip():
<                     failMsg += 'Empty message'
<                 if testOutput[:4] == 'Warn':
<                     xmlEntry.add_failure_info(message=failMsg.lstrip(),
<                                               failure_type='warning')
<                 else:
<                     xmlEntry.add_failure_info(message=failMsg.lstrip())
<             except AttributeError:
<                 pass
< 
<             if testProc.stderr:
<                 xmlEntry.add_error_info(message=testProc.stderr,
<                                         error_type='warning')
< 
<             if otherOut:
<                 xmlEntry.stdout = otherOut[:-1]
<                 otherOut = ''
---
>             if xml_outfile:
>                 try:
>                     testMsg = re.search(r'(?<=\[).+(?=\])', testOutput).group()
>                 except AttributeError:
>                     testMsg = testOutput[6:].split('; error:')[0]
> 
>                 xmlEntry = TestCase(testMsg, classname=testId,
>                                     elapsed_sec=wallTime,
>                                     status=testOutput[:4])
> 
>                 try:
>                     failMsg = re.search('(?<=error:).+', testOutput).group()
>                     if not failMsg.lstrip():
>                         failMsg += 'Empty message'
>                     if testOutput[:4] == 'Warn':
>                         xmlEntry.add_failure_info(message=failMsg.lstrip(),
>                                                   failure_type='warning')
>                     else:
>                         xmlEntry.add_failure_info(message=failMsg.lstrip())
>                 except AttributeError:
>                     pass
> 
>                 if testProc.stderr:
>                     xmlEntry.add_error_info(message=testProc.stderr,
>                                             error_type='warning')
> 
>                 if otherOut:
>                     xmlEntry.stdout = otherOut[:-1]
>                     otherOut = ''
139c114
<             xmlParser.test_cases.append(xmlEntry)
---
>                 xmlParser.test_cases.append(xmlEntry)
143,144c118,120
<     with open(xml_outfile, 'w') as fid:
<         xmlParser.to_file(fid, [xmlParser])
---
>     if xml_outfile:
>         with open(xml_outfile, 'w') as fid:
>             xmlParser.to_file(fid, [xmlParser])
190,191c166,170
<         with open(args.from_file, 'r') as fid:
<             tests = [testsDir / testName.rstrip() for testName in fid]
---
>         try:
>             with open(args.from_file, 'r') as fid:
>                 tests = [testsDir / testName.rstrip() for testName in fid]
>         except IOError:
>             sys.exit(f'Could not read file: {args.from_file}\n')
206c185
<         unitTestHarnessNO(tests)
\ No newline at end of file
---
>         unitTestHarness(tests)

I think if you agree with my changes, the easiest thing to do would be to push unittestharness.py.txt to this PR, like before, and then I can apply the patch shown above, so that we can hand it off to @stephankramer for another check.

gnikit commented 3 years ago

Hi @tmbgreaves , I was wondering if you could have a look at what is happening with Jenkins. It looks like Jenkins is not being set up from the Jenkinsfile of my branch, but from the master, so it keeps using the bionic-python3 docker image which has not been updated.

If that's the case, could someone with write access to the Fluidity repo cherry-pick 7b792a037f020552c6a21644e5bf3b148e3d186f into another PR/master so that the updated Jenkinsfile is used?

Patol75 commented 3 years ago

@gnikit Thanks for the feedback! Regarding your two proposed changes, I had originally incorporated them both, but I finally decided to remove them.

I agree with you that it is better to have one function instead of two. However, on the other side of things, I find it quite annoying that we have to evaluate a condition potentially thousands of times, if not more, within this single function when we actually know the result of the evaluation even before the function is called and it will not change at all during execution. I have thought of using decorators to keep a single function, but it looks way too complicated considering all the locations where the condition is evaluated within the function. In the end, I think it is simpler to have two functions, and it is not too much of a drawback as the case without junit_xml results in a 30-line function. Nonetheless, it is true that code duplication means we have to update two functions instead of one if there is any change that impacts the NO function. But again, that is only a 30-line function.

About checking that the path provided by from_file leads to a valid file in the system, I think it is not really necessary because the open call will be quite explicit about it anyway; it will raise a FileNotFoundError.

I will push the file here, and then back to you for any patching you think is still necessary. :)

tmbgreaves commented 3 years ago

I think that's a very reasonable one-line direct fix into master for the docker base image - I'll push the change.

gnikit commented 3 years ago

I think that's a very reasonable one-line direct fix into master for the docker base image - I'll push the change.

Something seems to have gone wrong on master when calling junit at the very end. Do you have any idea why that is?

stephankramer commented 3 years ago

So what's going on is the following: it is now failing the diamond_validation medium test (see the Tests tab on top if you're using the Ocean Blue theme of Jenkins). To see why it has failed you have to open up the "build"log of "make mediumtest" and search for diamond_validation. Unfortunately the actually python backtrace is out of sync with the rest of the testharness output - search for "Traceback" to find:

Traceback (most recent call last):
  File "/home/jenkins/workspace/fluidity_master/python/fluidity/regressiontest.py", line 304, in run_python
    exec(self.code, varsdict)
  File "<string>", line 10, in <module>
  File "../../libspud/diamond/diamond/schema.py", line 28, in <module>
    from . import preprocess
  File "../../libspud/diamond/diamond/preprocess.py", line 25, in <module>
    from future.standard_library import install_aliases
ModuleNotFoundError: No module named 'future'

So diamond_validation uses some diamond python code which appears to rely on the future module. It seems that we have a missing dependency in our fluidity-dev package which should depend on python3-future. The fluidity package depends on spud-diamond which does depend on python3-future, but obviously you don't need to have the fluidity package installed just to build and test it.

@tmbgreaves : could we pester you once more and add python3-future to fluidity-dev and install it in the bionic base image?

tmbgreaves commented 3 years ago

@stephankramer - done, thanks for the heads-up. Pulling the new image to the builders now. Note to future self that when builder 2 comes back online we'll need to manually pull the latest image there, as Jenkins doesn't update images automatically.

Patol75 commented 3 years ago

It looks like we finally have a Green! Yay!

Results are as expected:

###########
# Results #
###########

    Passes: 18039

    Warnings: 2
        ['test_1d_pickers']

    Failures: 132
        ['test_quadrature']

    Errors: 0
        []

    Skipped: 0
        []

However, I am not sure Jenkins picked up the xml file for unittest results. I see detailed results for short and medium tests, but I can not find unittests. Would that explain why we are getting a Green and not a Red? Because, in the end, some tests did fail, right?

Also, in the Artifacts tab associated to this build (Blue Ocean), there are 5 Tracebacks, all of which reporting:

Traceback (most recent call last):
  File "./plotfs_detec.py", line 7, in <module>
    import commands
ModuleNotFoundError: No module named 'commands'

A quick search on PyPI shows that there is no such "commands" package. I am not too sure if it is important, though.

gnikit commented 3 years ago

@Patol75 I think this has to do with how Jenkins is configured. You can see that even though I have updated our Jenkinsfile, Jenkins still runs from the old one i.e. Jenkinsfile from master. Not sure why that is, but I am pushing a workaround.

EDIT: I think because this is a fork, Jenkins only runs the PR-merge of this branch with Fluidity master and not the branch itself.

stephankramer commented 3 years ago

@Patol75 : re. the failed imports of "commands". This is a deprecated module in the stdlib of python2: https://docs.python.org/2/library/commands.html which was imported but not actually used in some plotting scripts in 4 wetting and dryings tests. I have just fixed this on master.

stephankramer commented 3 years ago

@Patol75 I think this has to do with how Jenkins is configured. You can see that even though I have updated our Jenkinsfile, Jenkins still runs from the old one i.e. Jenkinsfile from master. Not sure why that is, but I am pushing a workaround.

EDIT: I think because this is a fork, Jenkins only runs the PR-merge of this branch with Fluidity master and not the branch itself.

The PR-merge is master with this branch merged in (i.e. the same thing that would happen when we actually do the merge), so it should include all changes you make to the Jenkinsfile in this branch. I suspect you may only give one junit command, and should combine the two into one with all xml files as arguments? Otherwise it may just report what is given in the last command, i.e. it overrides. In any case your "workaround" seems like a cleaner solution - better to have all junit files in one place.

Patol75 commented 3 years ago

@gnikit @stephankramer Thank you for the detailed explanations. We now have the Red we are supposed to get. Or almost I would say! It turns out that the current way I am outputting Warnings results in Jenkins counting them as test failures. There are actually two kinds of Warnings: ones that result from the execution of the flml associated to the test, and ones that result from the actual evaluation of a test within a test (Inception, Fluidity's version). What kind of behaviour do we want here? I believe that the former kind of Warning should not be reported as an error (hence requiring a small patch), while the latter should be (current behaviour). If we agree on that, I will push the relevant changes.

@stephankramer Looking at the Jenkins report on master after your yesterday's push, we now have Tracebacks similar to:

Traceback (most recent call last):
  File "./plotfs_detec.py", line 168, in <module>
    main()
  File "./plotfs_detec.py", line 71, in main
    s = stat_parser(filename)
  File "/home/jenkins/workspace/fluidity_master/python/fluidity_tools.py", line 325, in __init__
    statfile=open(filename, "r")
FileNotFoundError: [Errno 2] No such file or directory: 'balzano2_2plus1.detectors'

Once more, I am not entirely sure of what is going on, but judging from the log, it is possible that the Python scripts get executed after the rm calls occurred, but before the actual execution of the tests, which would explain why the file is non-existent. Another potential lead is from the rm call: rm -f *.ele *.edge *.node *.poly *vtu *.s *.d.1 *.stat *.msh *.detectors.h5part matrixdump* *.pdf *.png *.dat. I do not see any *.detectors, but there is a *.detectors.h5part. That contradicts what I have just inferred, but then potentially the filename should be balzano2_2plus1.detectors.h5part? Hope that helps.

gnikit commented 3 years ago

Once again @Patol75 sorry for the late reply, completely forgot about this PR.

What kind of behaviour do we want here? I believe that the former kind of Warning should not be reported as an error (hence requiring a small patch), while the latter should be (current behaviour).

I think you are right about this. The Fluidity Warnings/Errors placed in testProc.stderr cannot easily be sanitised so redirecting the stderr to stdout would be a viable fix IMO; something along the lines of

if testProc.stderr:
                otherOut += testProc.stderr.lstrip()

instead of calling add_error_info() works for me (not sure if there is a more elegant fix for this, through junit itself)

Patol75 commented 3 years ago

@gnikit No worries, I think it is good to go now. @stephankramer Mind having a last look? We will just have to revert the 'test changes' made to the two or three unittests before we merge.

Patol75 commented 3 years ago

Thanks @stephankramer. Would you mind elaborating on how a class structure could help avoiding the use of logic branches? I am more than happy to give a quick re-write to the whole script.

Patol75 commented 3 years ago

I hope I did not break anything, but that should be the rebase.

Patol75 commented 3 years ago

It looks like this is good to go @stephankramer.

image

image