GrieferAtWork / tpp

Tiny PreProcessor
Other
17 stars 1 forks source link

Incorrectly cleaning up ..\ sequences in file name #2

Closed HenryKaufman closed 3 years ago

HenryKaufman commented 3 years ago

Hello,

First I just want to say that I evaluated several open source preprocessors and TPP is working the best from what I've seen (aside from gcc itself). That being said, I did find one bug that I fixed that I'd like to share with you. I have a case where my relative include file goes up a few levels, as in #include "..\..\..\..\file.h". In do_fix_filename, the ..\'s were being removed except for the last one. I think it's because the algorithm expected a sequence like someDir\.. which can be eliminated because the .. cancels the someDir, but this doesn't work when there are several ..\s in a row. So basically I added a check for "..". Here is my fix (in do_fix_filename):

      case 2:

if (text_iter[1] == '.' && (text_iter - 2) > base) { char prev_folder_start; // Try to remove a `..' path reference. prev_folder_start = (char )memrchr(base, SEP, (size_t)((text_iter - 2) - base)); if (!prev_folder_start) { base = slash + 1; break; } if (prev_folder_start[1] != '.' && prev_folder_start[2] != '.') { // <<<<- fix the case of /../.. getting cleared out // Move everything from slash' toprev_folder_start'. if (text_end == slash) prev_folder_start = '\0'; else memmove(prev_folder_start, slash, (size_t)((text_end - slash) + 1) sizeof(char)); text_end -= (size_t)(slash - prev_folder_start); slash = prev_folder_start; } }

It fixed my case, but I didn't check if it broke other cases. You are welcome to incorporate this into your codebase.

Thanks!

-Henry

GrieferAtWork commented 3 years ago

Honestly, I'm unable to reproduce the bug you're reporting. I mean don't get me wrong: I can see what you mean: It looks like I just blindly remove whatever comes before the .., but there already is a special case that handles what you're describing (the special case is even visible in your "patch"):

                /* Try to remove a `..' path reference. */
                prev_folder_start = (char *)memrchr(base, SEP, (size_t)((text_iter - 2) - base));
                if (!prev_folder_start) {
                    base = slash + 1; /* <<< NEW COMMENT: This right here prevents the problem you're describing (?at least for me?) */
                    break;
                }

When TPP is unable to find a SEP (aka. '/' or '\\') prior to the .., it will move the base-location of the filename-part it searches in later passes up until after the most recent SEP. In other words, if you follow the logic for a string like "..\..\..\..\file.h", that looks like follow:

  1. text_iter = base = filename; --> all are set to "..\..\..\..\file.h"
  2. slash = strchr(text_iter, SEP); --> slash = "\..\..\..\file.h"
  3. partsize = (size_t)(slash - text_iter); --> partsize = 2
  4. if (partsize >= 1 && *text_iter == '.') --> yes
  5. case 2:
  6. if (text_iter[1] == '.' && (text_iter - 2) > base) --> no (because text_iter == base)
  7. text_iter = slash + 1; --> text_iter = "..\..\..\file.h"
  8. slash = strchr(text_iter, SEP); --> slash = "\..\..\file.h"
  9. partsize = (size_t)(slash - text_iter); --> partsize = 2
  10. if (partsize >= 1 && *text_iter == '.') --> yes
  11. case 2:
  12. if (text_iter[1] == '.' && (text_iter - 2) > base) --> yes (because text_iter - base == 3)
  13. prev_folder_start = (char *)memrchr(base, SEP, (size_t)((text_iter - 2) - base)); --> prev_folder_start = NULL (because (size_t)((text_iter - 2) - base) == 1 && base[0] != SEP)
  14. base = slash + 1;
  15. text_iter = slash + 1;
  16. At this point, text_iter = base = "..\..\..\file.h". Iow: the leading "..\" were skipped but not removed (base is only used for this dont-remove-leading-..-sequence), and everything repeats at step 1., only this time the string left to scan is 3 characters shorter.

So what you're describing shouldn't happen. And as a matter of fact: ?doesn't happen for me?

Or to put the logic in 1 sentence: Only something like "\dir\.." is removed, because the area searched for SEP is set such that it starts after any leading "..\..\..\" sequences

On a related note, I just noticed TPP failing to open files when using -fno-canonical-system-headers (which is the extension that causes fix_filename() (and do_fix_filename()) to be called; the path where that function wasn't called forgot to NUL-terminate the filename, so it tried to do something like open("..\\..\\..\\..\\def.h\"\n\nfoo\nbar\n...")). But I don't believe that this was your problem since this was a bug outside of fix_filename().

Please provide more information on how to reproduce this bug.

HenryKaufman commented 3 years ago

My case was:

#include ".\..\..\..\..\dir1\dir2\dir3\file.h"

and I got out "..\file.h" or "..\dir1\dir2\dir3\file.h" (I can't remember which) which was not found, of course. So it's possible the leading ".\" case was not taken into account in your logic.

GrieferAtWork commented 3 years ago

Thanks. I was able to reproduce it with your new information. It really was related to the leading ".\" which didn't update the base used by the dont-remove-leading-parent-references logic. So as a consequence, it didn't consider a leading "..\" to actually be leading when following a ".\".

Test added in: d326f21c283af06594539e13397cc83b672c6455 Fixed in: bf7fa42fb091743da3a4aee7b899af2186d6fece