abduld / include-what-you-use

Automatically exported from code.google.com/p/include-what-you-use
Other
0 stars 0 forks source link

pragma keep doesn't prevent removal of dublicates #140

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Given
==> double.c <==
int main()
{
    int a;
#include "double.h" // IWYU pragma: keep
#include "double.h" // IWYU pragma: keep
}
==> double.h <==
a+=1;

IWYU suggests double.h to be removed. The pragma should prevent also duplicates 
from being removed.

Original issue reported on code.google.com by rol...@rschulz.eu on 30 May 2014 at 8:07

GoogleCodeExporter commented 9 years ago
Thanks, this came up in issue #74, but it needs its own work item.

Original comment by kim.gras...@gmail.com on 30 May 2014 at 8:17

GoogleCodeExporter commented 9 years ago
I'm working on this.

Original comment by kim.gras...@gmail.com on 31 May 2014 at 10:13

GoogleCodeExporter commented 9 years ago
Here's a patch that doesn't solve the problem :-) I thought I'd start by 
refactoring the affected code so as to make this easy to fix.

Previously, the elimination of duplicate includes happened implicitly because 
each suggested headers/decls was mapped to a single corresponding include or 
forward decl, even if there might be several providing the same symbol. This 
had the side-effect that all but the last duplicated include disappeared.

I've now updated CalculateDesiredIncludesAndForwardDeclares to keep multimaps 
for these reverse lookups from symbol use to symbol provider. The function now 
maintains all providers for a symbol, and ends with explicitly clearing the 
desired flag for all but the first one 
(ClearDesiredForSurplusIncludesOrForwardDeclares.)

If this is an acceptable direction, I thought I'd modify ClearDesired... in a 
separate change to leave pragma-keep-marked headers alone.

CalculateDesiredIncludesAndForwardDeclares is now more complicated, and I've 
added support functions here and there. I'm not extremely happy with the 
details of all this, but I think the general approach is correct, i.e. to clean 
out duplicates as an explicit step.

Feedback welcome.

Original comment by kim.gras...@gmail.com on 1 Jun 2014 at 7:01

Attachments:

GoogleCodeExporter commented 9 years ago
How exactly do you want to indicate that a line should be kept due to IWYU 
pragma: keep?  

I had an idea to add OneIncludeOrForwardDeclareLine::is_explicitly_kept.  Then 
during emitting recommendations check line.is_desired() || 
line.is_explicitly_kept().  This way is_desired and is_explicitly_kept can 
coexist independently unaware of each other presence.  And in 
CalculateDesiredIncludesAndForwardDeclares in include_map and fwd_decl_map just 
keep an index of the first OneIncludeOrForwardDeclareLine.  In this case we 
don't need multimap.  Unfortunately, this approach gets a little bit ugly in 
case of

    #include "foo.h"
    #include "foo.h"  // IWYU pragma: keep

My first idea was to store in include_map the index of the first explicitly 
kept OneIncludeOrForwardDeclareLine or just the first line if none is 
explicitly kept.

Cleaning duplicates as a separate step is a good approach, unless we need 
specifically to collect these duplicates.  Currently, introducing multimaps 
only to keep the first include looks like an overkill.  Can you explain your 
next steps a little bit more?

Original comment by vsap...@gmail.com on 5 Jun 2014 at 3:33

GoogleCodeExporter commented 9 years ago
OK, so instead of marking all lines as desired, we just avoid removing 
pragma-keep-marked lines irrespective of whether they're desired or not? That 
sounds easier, but it spreads the policy decision around -- I think it's nice 
to keep the rules for what's desired and why in this function. Then the diff 
emitter can just operate based on desired vs present. I don't know if it's that 
clear-cut at the moment, but I think it's a healthy direction.

> [...] unless we need specifically to collect these duplicates.

I don't understand what you mean by this.

> Can you explain your next steps a little bit more?

I thought I'd do the following:

- Like you suggest, add a boolean flag to OneIncludeOrForwardDeclareLine, 
indicating whether it's marked with "IWYU pragma: keep".
- Check for the flag in the new (awkwardly named)  
ClearDesiredForSurplusIncludesOrForwardDeclares method, and just not clear 
desired for explicitly kept lines.

I don't love the multimaps either, but I think they express the relationship 
correctly -- any use requires a header or a forward decl. We map each such used 
header or decl back to any number of written includes or fwd-decls in the file.

Once we have that information, we can make policy decisions for each individual 
include/fwd-decl.

We want to make sure we keep the right #include in a case like this, for 
example:

  #include "foo.h"

  Foo from_foo_h;

  #include "foo.h"

And actually, in the case below we need to keep both:

  #include "foo.h"

  Foo from_foo_h;

  #include "foo.h"  // IWYU pragma: keep

So it might be that we should leave any one #include with a location before all 
uses + all explicitly kept ones.

Not sure this makes a good case for multimaps as such, but I find it hard to 
separate the concerns of representing the state and applying the rules without 
them.

Original comment by kim.gras...@gmail.com on 5 Jun 2014 at 9:22

GoogleCodeExporter commented 9 years ago
Now I see that picking a correct line requires all duplicate lines.  And 
collecting all lines in multimap is justified.

Patch comments:
- clang complains about missing 'typename' prior to dependent type at 
iwyu_output.cc:1414, :1418.  I.e. it should be "typename 
ContainerType::iterator"
- have you considered using range-based for loops?  I am not eager to convert 
all for loops in the project to range-based loops right now.  But we can start 
using them in the new code.

Original comment by vsap...@gmail.com on 7 Jun 2014 at 1:13

GoogleCodeExporter commented 9 years ago
Oh, and maybe in Contains we can use any_of() from <algorithm>?

Original comment by vsap...@gmail.com on 7 Jun 2014 at 6:33

GoogleCodeExporter commented 9 years ago
Thanks for the feedback!

> - clang complains about missing 'typename' prior to dependent type at
> iwyu_output.cc:1414, :1418.  I.e. it should be "typename 
ContainerType::iterator"

Embarrassing. This has got to be one of MSVC's least helpful 
incompatibilities/extensions. Done.

> - have you considered using range-based for loops?  I am not eager 
> to convert all for loops in the project to range-based loops right
> now.  But we can start using them in the new code.

I hadn't but I tried in the new multimap traversal code. Unfortunately it seems 
that multimaps and range-based for loops are not friendly because std::multimap 
doesn't expose (as far as I can tell) any way to get a range for a specific 
key. The equal_range method returns a pair of iterators, which in turn doesn't 
qualify as a range :-/

> Oh, and maybe in Contains we can use any_of() from <algorithm>?

Nice, I didn't know about any_of. Done.

I'm not entirely happy with the matches() methods in 
OneIncludeOrForwardDeclareLine. Is there a better name? contains? wraps? holds? 
denotes?

Do you mind if I commit this refactoring first, and then work on the multiple 
pragma-keep based on it?

Original comment by kim.gras...@gmail.com on 8 Jun 2014 at 10:01

Attachments:

GoogleCodeExporter commented 9 years ago
I was thinking about OneIncludeOrForwardDeclareLine::matches() name and noticed 
that Contains has linear complexity.  I don't know if it matters, but I have to 
share this observation with you.

Range-based loops can be easily used instead of Each.  And for multimaps some 
simple custom ranges can be introduced.  We can reuse llvm/ADT/iterator_range.h 
or create something similar ourselves.

A few for loops can be modified so that end iterator is calculated only once.  
For example, in ClearDesiredForSurplusIncludesOrForwardDeclares 
"container.upper_bound(k->first)" is calculated at each cycle loop.

I suggest to rename ClearDesiredForSurplusIncludesOrForwardDeclares to 
ClearDesiredForDuplicateIncludesOrForwardDeclares and move it inside namespace 
"internal".

> Do you mind if I commit this refactoring first, and then work on the multiple 
pragma-keep based on it?
I'll review the patch so it can be committed separately.

Original comment by vsap...@gmail.com on 8 Jun 2014 at 4:57

GoogleCodeExporter commented 9 years ago
Yes, Contains is linear. I think it needs to be, in a sense, because we want to 
preserve the order of lines (i.e. we can't sort them and use a binary search). 
It might turn into a problem, because Contains will be called as many times as 
there are uses in a file, and operate on a vector of length (include lines + 
fwd decls). The number of includes & fwd decls will probably stay <100, but the 
number of uses feels like it could explode. We may want to revisit this.

I've updated most loops to be range-based instead, and the code is much clearer.

I didn't use range-based for for multimaps, because it never turned out nice. 
Instead I use `auto` to get rid of the iterator type names, please take a look.

I now cache the upper_bound in ClearDesiredForSurplusIncludesOrForwardDeclares.

Moved ClearDesired... and Contains into namespace internal. I prefer "Surplus" 
over "Duplicate", because duplicates will be allowed once we take pragma keep 
into account.

New patch attached with these things fixed.

Original comment by kim.gras...@gmail.com on 16 Aug 2014 at 9:33

Attachments:

GoogleCodeExporter commented 9 years ago
The patch looks good to me. Storing pointers to OneIncludeOrForwardDeclareLine 
in vector is quite fragile - adding a new item to the vector can invalidate all 
those pointers. But we can live with that, just need to be careful.

You've convinced me to use "Surplus".

Original comment by vsap...@gmail.com on 23 Aug 2014 at 7:50

GoogleCodeExporter commented 9 years ago
The last patch is still the refactoring without solving the issue, correct?

Original comment by rol...@rschulz.eu on 25 Aug 2014 at 3:44

GoogleCodeExporter commented 9 years ago
Roland: yes, that's correct. It should be trivially fixable on top of the 
latest patch, but I wanted to isolate the refactoring from the fix. I'll try 
and get the refactoring in tonight, and then post a new patch for fixing this 
issue.

Original comment by kim.gras...@gmail.com on 25 Aug 2014 at 6:48

GoogleCodeExporter commented 9 years ago
It turned out not to be trivial, but pretty easy. I only just got around to 
finishing the patch today.

I'm not entirely happy with how many places this patch touches, but the updated 
test passes fine here.

Please take a look.

Original comment by kim.gras...@gmail.com on 2 Sep 2014 at 7:30

Attachments:

GoogleCodeExporter commented 9 years ago
Fixes it for me both in the small test case and the original source file.

Original comment by rol...@rschulz.eu on 4 Sep 2014 at 12:26

GoogleCodeExporter commented 9 years ago
Overall the patch looks good, have some test case questions. How should we 
handle the following cases?

  #include "foo.h"
  #include "foo.h"  // IWYU pragma: keep

  #include "bar.h"  // IWYU pragma: keep
  #include "bar.h"
  #include "bar.h"  // IWYU pragma: keep

Also I was thinking about verifying in tests which exactly line is suggested to 
be deleted and I don't have a solution.  difflib doesn't support custom line 
comparators and it doesn't look like a trivial problem.  If you have any ideas, 
I'll be glad to here them.

Something to consider - we might have in the future other reasons to keep 
surplus includes, not just "keep" pragma. For example, we should keep x-macros 
includes. I am comfortable with the current approach, it shouldn't be hard to 
extend it to support other reasons to keep surplus includes.  Just wanted to 
bring it to your attention.

Original comment by vsap...@gmail.com on 7 Sep 2014 at 5:30

GoogleCodeExporter commented 9 years ago
I'm not sure what the policy should be, actually.

Currently we always keep the first include (with or without pragma:keep) and 
any subsequent ones with pragma:keep.

It creates slightly surprising behavior in your test cases above (all "foo.h" 
are retained, the middle "bar.h" is removed.)

I supposed we could make sure to keep at most as many includes as there are 
pragma:keep directives (i.e. remove both the unmarked "foo.h" and the unmarked 
"bar.h"), but I'm a little worried about #include order. If there are uses from 
"foo.h" between the naked and the kept include, things will start to break.

So I think the current behavior is reasonably conservative and likely to work, 
if not optimal. Would you like me to add a separate test case for these?

I haven't given line number checking any thought. I'm not familiar with the 
details of difflib and its use here. iwyu_test_util.py seems to pull some 
tricks in _NormalizeSummaryLineNumbers, but that's only for a single line; we'd 
like to express constraints between lines...

And yes, that's part of the reason for the fuzzy "surplus" term, that more 
exceptions will accrue there. I'm way ahead of you :-)

Original comment by kim.gras...@gmail.com on 7 Sep 2014 at 6:33

GoogleCodeExporter commented 9 years ago
Here's a patch with additional tests for the different combinations of keep and 
no keep. I'm not sure it adds any additional clarity.

You decide which patch you prefer, I don't have any strong opinions here.

I investigated the line numbers, but I couldn't find a way to specify more 
precisely.

Original comment by kim.gras...@gmail.com on 8 Sep 2014 at 8:02

Attachments:

GoogleCodeExporter commented 9 years ago
I think we should keep the first include even if it doesn't have IWYU pragma: 
keep.  At first I've thought we should remove it, but doing so is confusing.  
For example, user sees the second include is removed, adds pragma to keep it 
and now the first include is removed.  Why adding pragma to one line affects 
another line?  And it feels like playing whack-a-mole, which is not good.  So 
current behavior is reasonable.

OK, let's not bother with checking line numbers.

I'll review the test case on weekend, I think not all cases are valuable.  For 
instance, my earlier "bar.h" example is quite useless.

Original comment by vsap...@gmail.com on 11 Sep 2014 at 4:13

GoogleCodeExporter commented 9 years ago
Why do we keep the first "...-d21.h"? Looks like it is unused and unused 
includes should be kept only if they are marked with IWYU pragma: keep.

I suggest the following test cases:
    #include "d21.h"
    #include "d21.h"  // IWYU pragma: keep

    #include "d22.h"  // IWYU pragma: keep
    #include "d22.h"

    #include "d23.h"
    #include "d23.h"  // IWYU pragma: keep

    #include "d24.h"  // IWYU pragma: keep
    #include "d24.h"

    use symbol from d21.h
    use symbol from d22.h

It's a cartesian product of {first include is kept, second include is kept} x 
{include is used, include is not used}.

Original comment by vsap...@gmail.com on 14 Sep 2014 at 5:30

GoogleCodeExporter commented 9 years ago
Thanks for reviewing, excellent question! Yeah, that makes no sense, let me run 
this through the debugger.

I'll add tests to cover your suggested matrix, good idea!

Original comment by kim.gras...@gmail.com on 14 Sep 2014 at 6:19

GoogleCodeExporter commented 9 years ago
So here's what happens:

When the preprocessor sees "// IWYU pragma: keep" in MaybeProtectInclude it 
calls ReportIncludeFileUse to add a direct dependency between includer and kept 
_header file_.

This means _all includes_ of said header are protected, not just the one that 
was marked with a keep pragma.

Before my patch, as long as there was one keep pragma, all includes would be 
marked desired, and then the duplicates would be stripped, leaving only one 
(more or less random) include.

I'll have to think about what behavior we really want here and see if I can 
achieve it.

Original comment by kim.gras...@gmail.com on 14 Sep 2014 at 8:40

GoogleCodeExporter commented 9 years ago
I have a patch now that allows keeping of individual include lines. It becomes 
a little more convoluted, but could probably be made to work.

But seeing the test cases made me wonder whether this is actually what we want 
to do.

The prior policy was that if "IWYU pragma: keep" was added, all includes of the 
same header were kept. I'm starting to think that makes more sense.

For comparison, IWYU doesn't attribute uses to individual include lines but to 
a header name. All includes of said header name are equally desired, and then 
duplicate removal makes sure there's only one left.

I think the same thinking holds for keep-marked include lines, except we 
shouldn't remove duplicates. So if an include name is marked "pragma: keep" in 
one include line, all other include lines of the same name would be kept as 
well, whether they're keep-marked or not.

What do you think?

Original comment by kim.gras...@gmail.com on 27 Sep 2014 at 6:40

GoogleCodeExporter commented 9 years ago
I've thought after your comment #22 that IWYU checks if it is necessary to 
include a file, not if it is necessary to have an inclusion directive.  And 
trying to handle inclusion directives together with files might get ugly 
because they are quite different concepts.

You confirm that the logic becomes convoluted.  Let's not push in this 
direction then and keep it simple.  After all, my "d23.h" test case is 
contrived.  I think in real code when IWYU recommends to remove both #include 
"d23.h" lines, user can figure out which one to keep and will add "pragma: 
keep" if necessary.  So fixing "d23.h" doesn't solve any real problem, only 
hypothetical one.

Actionable response: let's use 140keep-additional-test.patch without 
"...-d22.h" ("...-d23.h" will be renamed to "...-d22" to keep numbers 
consecutive).

As the long-term solution I think we should start distinguishing between 
include-as-use-a-module and include-as-paste-content-of-another-file.  IWYU is 
built to handle the former case and gets confused when encounters the latter.  
And we should separate these cases as early as possible.  For example, handling 
include-as-paste-content-of-another-file when IWYU outputs recommendations is 
most likely to be too late.

Original comment by vsap...@gmail.com on 29 Sep 2014 at 3:36

GoogleCodeExporter commented 9 years ago
Thank you for your thoughtful response.

I agree with your conclusion that IWYU reasons in terms of files, not 
individual include directives, and I find your observation that we should 
separate module-like includes from codegen-like includes very interesting for 
the longer term.

I tried to revisit the 140keep-additional-test patch, but I still have some way 
to go with it before I'm happy. I'll post back as soon as I have something 
reasonable.

Original comment by kim.gras...@gmail.com on 1 Oct 2014 at 6:45

GoogleCodeExporter commented 9 years ago
OK, here goes the simplest diff I've been able to reduce that implements the 
keep-all policy.

It introduces a new ReportPragmaKeep method on IwyuFileInfo, so we maintain the 
files that want to be kept.

It's a little silly in that it first clears desired for all duplicates, and 
then re-sets it, but with all the data structures and type mangling going on 
here, this was the clearest I could come up with.

Let me know if this makes sense!

Original comment by kim.gras...@gmail.com on 1 Oct 2014 at 8:47

Attachments:

GoogleCodeExporter commented 9 years ago
I like how simple it is. Constantly changing desired state can look weird at 
the first glance (set desired, clear desired, set desired again). But it is 
simpler than alternative - trying to determine desired state from the first 
attempt. So I am OK with your approach.

Original comment by vsap...@gmail.com on 5 Oct 2014 at 7:16

GoogleCodeExporter commented 9 years ago
r586, whew! Celebration is in order :-)

I'll merge this one to the 3.5 release branch too, right?

Original comment by kim.gras...@gmail.com on 5 Oct 2014 at 7:40

GoogleCodeExporter commented 9 years ago
Yes, sure.

Original comment by vsap...@gmail.com on 5 Oct 2014 at 7:43