add regex option - Githubissues

gro1m commented 2 years ago

Adds a regex option.

Examples:

$ pycobertura show coverage.xml --regex "^c"
Filename        Stmts    Miss  Cover    Missing
------------  -------  ------  -------  ---------
cli.py             78       0  100.00%
cobertura.py      206       0  100.00%

pycobertura show coverage.xml --regex "^u"
Filename      Stmts    Miss  Cover    Missing
----------  -------  ------  -------  ---------
utils.py        118       1  99.15%   38
TOTAL           695       1  99.86%

Did not test diff yet, there are also no tests yet...

Just wanted to get your opinion on it, @aconrad. I tried to keep the logic simple and here it is for including the files you want but you could also write the regex as such as to exclude the files you do not want. It is a bit complicated, but it works:

pycobertura show coverage.xml --regex '^(?!u|\\.).*'
Filename                 Stmts    Miss  Cover    Missing
---------------------  -------  ------  -------  ---------
reporters.py               192       0  100.00%
__init__.py                  2       0  100.00%
filesystem.py               87       0  100.00%
templates/__init__.py        0       0  100.00%
cobertura.py               206       0  100.00%
templates/filters.py        12       0  100.00%
cli.py                      78       0  100.00%
TOTAL                      695       1  99.86%

The single quotes are absolutely necessary, otherwise one gets:

bash: !u: event not found

aconrad commented 2 years ago

Can you update from master? I enabled Github Actions although we don't have the coverage pipeline working for now, will have to do that subsequently. See #139.

gro1m commented 2 years ago

Can you update from master? I enabled Github Actions although we don't have the coverage pipeline working for now, will have to do that subsequently. See #139.

Yes, done right now and at least the checks seem to be back again and I think they should fail :)

gro1m commented 2 years ago

@aconrad The tests fail, because the Main class is shown twice. Indeed, the Main class occurs twice in cobertura.xml, so is the test actually correct and should we modify the assertions. Could you please check?

aconrad commented 2 years ago

Hm, this cobertura.xml file was taken from the internet somewhere, I didn't generate it – I was looking for a sample cobertura file to use for testing when I started pycobertura. But I did a git blame on it and noticed that this file was tampered with in 1fb4d0b1 (PR https://github.com/aconrad/pycobertura/pull/82), was a bug introduced there?

gro1m commented 2 years ago

Hi @aconrad Thanks for checking! I think for now it would be best to remove these lines (maybe you could open a separate PR so that we have this well tracked). And in future it might be good if we could generate these files ourselves, but I think this would be an enhancement that we might not want to do right now... What do you think?

aconrad commented 2 years ago

I think for now it would be best to remove these lines (maybe you could open a separate PR so that we have this well tracked).

I'll have to take a closer look at why we have a filename that doesn't look like a path, I'm not familiar with this syntax. Is it a Java thing? We can follow up in the original PR and ask the author.

And in future it might be good if we could generate these files ourselves, but I think this would be an enhancement that we might not want to do right now...

I didn't want to generate my own to ensure that pycobertura wasn't biased parsing a non standard file format (e.g. as you noticed, even the coveragepy-generated files are broken by omitting the directory in the filename attributes). The original coverage schema standard is not very precise in describing what the value of each attribute should be.

What do you think?

I would encourage introducing files generated by other languages. But if we were to generate our own, how would we go about it? Do you mean the ones generated by the pycobertura project? What value would it bring vs the ones we already have? Some of the sample coverage files in the tests directory were generated ourselves.

gro1m commented 2 years ago

I think for now it would be best to remove these lines (maybe you could open a separate PR so that we have this well tracked).

I'll have to take a closer look at why we have a filename that doesn't look like a path, I'm not familiar with this syntax. Is it a Java thing? We can follow up in the original PR and ask the author.

In the end, I need your decision on this: either we count double and leave the file as is or we exclude these lines. As it stands now, the tests will have to fail... For me there would be less confusion leaving them out, but in the end it does not matter much...

As a side-note: Also this file contains a method section, so I am a bit surprised about this issue: https://github.com/aconrad/pycobertura/issues/118.

gro1m commented 2 years ago

What do you think?

I would encourage introducing files generated by other languages. But if we were to generate our own, how would we go about it? Do you mean the ones generated by the pycobertura project? What value would it bring vs the ones we already have? Some of the sample coverage files in the tests directory were generated ourselves.

Actually we would need to test some popular coverage report generators always on the same code logic but in different languages. Yes, I agree this would be challenging and I currently have no idea on how to go about this, but maybe someone else from the community could support here. Also the code logic would have to be small, but still relevant. The only reason I brought it up is that if the truth lies in the project's code, it is much easier to understand the code and why certain changes have evolved over time.

gro1m commented 2 years ago

@aconrad I think I still need your decision here about the cobertura.xml, i.e. correcting the tests to account for the duplicate Main.java or removing the corresponding line from the cobertura.xml...

aconrad commented 2 years ago

The method Cobertura.files() was designed to deduplicate filenames if they appeared multiple times and we should retain that behavior since that seems to be a valid use case (the motivation for #82).

There can be multiple classes in the same filename, and since the <class> element is meant to represent a class object, it's possible to find a different class within the same file. It makes sense to me.

gro1m commented 2 years ago

The method Cobertura.files() was designed to deduplicate filenames if they appeared multiple times and we should retain that behavior since that seems to be a valid use case (the motivation for #82).

There can be multiple classes in the same filename, and since the <class> element is meant to represent a class object, it's possible to find a different class within the same file. It makes sense to me.

Ok, but should we then not include all of them, i.e. does it not work against https://github.com/aconrad/pycobertura/pull/132.

For now, I reverted the logic so that this PR should be good to review, but I am not sure I quite understood the above...

aconrad commented 2 years ago

but I am not sure I quite understood the above...

My understanding is that if we have a file foods.py, this file could have class Apple: ... and class Banana: .... The coverage tool could generate the following report:

<coverage>
  <packages>
    <package ...>
      <classes>
        <class name="Apple" filename="foods.py" ...>
          <methods>...</methods>
        </class>
        <class name="Banana" filename="foods.py" ...>
          <methods>...</methods>
        </class>
      </classes>
    </package>
  </packages>
</coverage>

When we call Cobertura.files(), the expectation is that ["foods.py"] would be returned, not ["foods.py", "foods.py"]. This is why #82 was implemented.

Does it add clarity?

aconrad commented 2 years ago

Hm. Let me take a closer look at #132, I haven't looked into it yet.

gro1m commented 2 years ago

Hi @aconrad After thinking more about it now, I understood that the current implementation is indeed correct, because we measure the stats on a file-based level. The only question one might ask is why are we parsing on \\class and not on something like filename or file path(because there could be 2 files with same name in different directories), as we have file-based stats and not class-based stats: is this not possible within lxml? And also one could discuss if we should offer different views, e.g. directory level, file level, class level, method level, maybe even a tree. But I think the discussion now diverges from what the PR additionally wanted to address and should probably take place in a different issue, if you think it could be worth it to discuss this further.

aconrad commented 2 years ago

The only question one might ask is why are we parsing on \\class and not on something like filename or file path(because there could be 2 files with same name in different directories), as we have file-based stats and not class-based stats: is this not possible within lxml?

Great question. When I intially implemented pycobertura, I used coverage reports generated by coveragepy as my sample coverage files. And we can tell from the report that 1 class == 1 file. Here's a snippet coverage from tests/dummy.original.xml:

<class branch-rate="0" complexity="0" filename="dummy/__init__.py" line-rate="0" name="dummy/__init__">
        <methods/>
        <lines/>
</class>
<class branch-rate="0" complexity="0" filename="dummy/dummy.py" line-rate="0.5" name="dummy/dummy">
        <methods/>
        <lines>
                <line hits="1" number="1"/>
                <line hits="0" number="2"/>
                <line hits="1" number="4"/>
                <line hits="0" number="5"/>
        </lines>
</class>

So that's why the implementation is like that.

But over time, people have reported issues because their coverage files (generated by other tools/languages) aren't properly parsed by pycobertura (e.g. #82). Pycobertura is meant to be a Cobertura file parser and be language agnostic.

And also one could discuss if we should offer different views, e.g. directory level, file level, class level, method level, maybe even a tree. But I think the discussion now diverges from what the PR additionally wanted to address and should probably take place in a different issue, if you think it could be worth it to discuss this further.

I agree.

aconrad commented 2 years ago

This looks good, thanks! Can we update CHANGES file? Also, should we add a note somewhere in the README about this new option?

gro1m commented 2 years ago

This looks good, thanks! Can we update CHANGES file? Also, should we add a note somewhere in the README about this new option?

done, can you please check again :)

aconrad / pycobertura

add regex option #138