Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.57k stars 109 forks source link

Read $HOME config before (and in addition to) the one in $PWD #320

Closed stepnem closed 9 months ago

stepnem commented 9 months ago

Unlike tools like code formatters or linters where a single unified config maintained as part of a code repository or directory tree makes sense, for a search tool it is more ergonomic to always load user customization as a base for further directory-specific adjustments.

Before this, one had to copy and maintain common customizations in each directory-specific .ugrep file.

Now one can put generic settings like coloring, line numbering etc. in the $HOME .ugrep, and only maintain dir/repo-specific settings like includes/excludes in the relevant local .ugrep files.

NB: This is an RFC/PoC: it works for me and the test suite passes, but I'm not very familiar with either the code base or C++. I have also only included a minimal doc change, as there is a lot of redundancy in existing documentation; if we agree this PR is viable, I can include further documentation updates.

genivia-inc commented 9 months ago

I had thought about this before a while back, but didn't like it. The reason is that --save-config uses the home directory .ugrep file anyway to generate a new .ugrep file in the working directory. Therefore, normally we don't need to read the home dir .ugrep and the working dir .ugrep in these "normal" scenarios.

If ugrep is upgraded with new options (as in 4.0), then it might make a small difference to read both if only the home dir .ugrep is modified. Also if you're updating the home dir .ugrep, then just as likely there will be scenarios in which the working dir .ugrep should not be affected (i.e. don't read both).

stepnem commented 9 months ago

I had thought about this before a while back, but didn't like it. The reason is that --safe-config uses the home directory .ugrep file anyway to generate a new .ugrep file in the working directory. Therefore, normally we don't need to read the home dir .ugrep and the working dir .ugrep in these "normal" scenarios.

But, unless I'm missing something, --save-config has all the issues I mention in my commit message (duplication and maintenance, plus all the boilerplate --save-config puts in the config file).

Also if you're updating the home dir .ugrep, then just as likely there will be scenarios in which the working dir .ugrep should not be affected (i.e. don't read both).

Why? The local .ugrep overrides/extends the $HOME one (with my patch), so in that case I would add an override to the local .ugrep, but I certainly expect that to be the exception rather than the rule, i.e., I don't find things like colors or line numbering (which I put into the default $HOME .ugrep) likely to be directory/repo-specific, unlike e.g. includes/excludes.

I find the default ($HOME) + local overrides model both easier to maintain and reason about than what --save-config does (I now reread the documentation and did some tests with --save-config, but still am not sure I quite understand it).

It's also the way e.g. Git configuration works (global .gitconfig in $HOME + repo-specific config overriding the defaults), so I'd expect it to better match other users' expectations as well.

genivia-inc commented 9 months ago

The --save-config is typically used with options supplied to override the current options. That's the simplest way to use ug --save-config OPTIONS (as shown in the docs and on ugrep.com) to enforce the given OPTIONS. Sure, the .ugrep file can be edited if need be, but it is not required and I never needed to do that.

Some options aren't saved by --safe-config because these options impact the search in ways that are likely causing problems, such as conflicting options. For example, option -P is not saved because the user may want to use -Z instead that conflicts.

genivia-inc commented 9 months ago

One thing we should probably do is renaming of the old .ugrep to .ugrep.old in case the user changes his/her mind, like this:

// save a configuration file
static void save_config()
{
  FILE *file = NULL;

  // rename old .ugrep to .ugrep.old when present
  if (strcmp(flag_save_config, ".ugrep") == 0 && std::rename(".ugrep", ".ugrep.old") == 0)
  {
    errno = EEXIST;
    warning("renamed existing .ugrep to .ugrep.old", NULL);
  }

  // if not saved to standard output ("-"), then inform user
  if (!flag_no_messages && strcmp(flag_save_config, "-") != 0)
    fprintf(stderr, "ugrep: saving configuration file %s\n", flag_save_config);

This also reports saving the new configuration file.

I really would like to get suggestions and feedback on these kind of little things that can make a difference.

stepnem commented 9 months ago

On Tue, 21 Nov 2023 08:10:56 -0800 Robert van Engelen wrote:

The --save-config is typically used with options supplied to override the current options. That's the simplest way to use ug --save-config OPTIONS (as shown in the docs and on ugrep.com) to enforce the given OPTIONS. Sure, the .ugrep file can be edited if need be, but it is not required and I never needed to do that.

Some options aren't saved by --safe-config because these options impact the search in ways that are likely causing problems, such as conflicting options. For example, option -P is not saved because the user may want to use -Z instead that conflicts.

Thank you for the explanation.

This sounds rather opaque and complex, basically a kind of hard-coded DWIM logic.

I guess our take on this is just too different; seeing you steer this PR in the direction of discussing possible improvements to --save-config (without any relation to the crux of this PR that I can see) only reinforces that impression.

I don't see myself endorsing --save-config (so I can't really provide the feedback you ask for, and am not sure others will notice such a request in this PR, either), so I guess it's either continuing to patch ugrep locally or supplying the options explicitly. It's not a huge deal, after all, the other CLI search tools I know don't have this functionality, either.[1]

Thanks.

[1] Actually, this made me recheck, and apparently ack does exactly what I (and I'm sure many other Unix users) expect: https://beyondgrep.com/documentation/ack-v3.7.0-man.html#ACKRC-LOCATION-SEMANTICS (But ripgrep or ag do not.)

genivia-inc commented 9 months ago

If anyone really wants to inherit .ugrep configurations, why not add an #include directive that reads another configuration file? It's not difficult to add such a feature. Or perhaps allow config=FILE in a .ugrep file to load another config FILE. But we need to watch out for duplicate/circular references to handle them properly and also decide what to do when a config file fails to open (probably abort).

On Tue, 21 Nov 2023 08:10:56 -0800 Robert van Engelen wrote: The --save-config is typically used with options supplied to override the current options. That's the simplest way to use ug --save-config OPTIONS (as shown in the docs and on ugrep.com) to enforce the given OPTIONS. Sure, the .ugrep file can be edited if need be, but it is not required and I never needed to do that. Some options aren't saved by --safe-config because these options impact the search in ways that are likely causing problems, such as conflicting options. For example, option -P is not saved because the user may want to use -Z instead that conflicts.

Thank you for the explanation. This sounds rather opaque and complex, basically a kind of hard-coded DWIM logic.

If running ug with an option that abort with an error message and requires editing the .ugrep file to make ug work again, how is that a bad ("opaque") thing? Using conflicting options fails or aborts with an error message, like GNU grep. Some options can interfere negatively.

If we allow --save-config to save certain options that can conflict with others, then we will probably see a lot of issues opened in this repo that complain that something with ugrep is not working in some dir but in other dirs work fine.

It can be worse if files are no longer found and are unexpectedly silently ignored, because of interference, e.g. specifying conflicting patterns or file types to (not) search for example.

You can explicitly specify anything in a .ugrep file if you want, since any of the long options can be specified, but we should not do this automatically with --save-config. It is better to be cautious than to be overly adventurous.

If you want transparency, then I invite you to add warning messages when options are specified on the command line that are not saved to .ugrep.

stepnem commented 9 months ago

On Wed, 22 Nov 2023 10:00:43 -0800 Robert van Engelen wrote:

If anyone really wants to inherit .ugrep configurations, why not add an #include directive that reads another configuration file? It's not difficult to add such a feature. Or perhaps allow config=FILE in a .ugrep file to load another config FILE. But we need to watch out for duplicate/circular references to handle them properly and also decide what to do when a config file fails to open (probably abort).

Yes, that would work, too, being even strictly more flexible than the inheritance model (I did consider that option, but the inheritance seemed simpler to implement, reason about and behaviorally preferable to me; but if you'd be willing to add the include functionality, that'd be great and should shut people like me up).

The --save-config is typically used with options supplied to override the current options. That's the simplest way to use ug --save-config OPTIONS (as shown in the docs and on ugrep.com) to enforce the given OPTIONS. Sure, the .ugrep file can be edited if need be, but it is not required and I never needed to do that. Some options aren't saved by --safe-config because these options impact the search in ways that are likely causing problems, such as conflicting options. For example, option -P is not saved because the user may want to use -Z instead that conflicts.

Thank you for the explanation. This sounds rather opaque and complex, basically a kind of hard-coded DWIM logic.

If running ug with an option that abort with an error message and requires editing the .ugrep file to make ug work again, how is that a bad ("opaque") thing?

I called the behavior you described opaque because the user is left in the dark as to what's happening and what the rules are (i.e., which options are saved or not and why), and complex/DWIM because the tool is trying to decide/compensate for the user rather than behaving in a consistent, transparent way.

I prefer explicit error message in case of conflicting options to silent (or any, really) adjustments; the options should always work (or not work) the same way, no matter if they're specified on the command line or in a config file; later options overriding previous options. I believe that's a common expectation at least of a Unix user.

If you want transparency, then I invite you to add warning messages when options are specified on the command line that are not saved to .ugrep.

Yeah, I'd say that being explicit about it is the least the tool should do, but given how alien the whole --save-config concept is to me I'd hesitate to play the judge here. Perhaps there is user demographic that welcomes the current behavior?

genivia-inc commented 9 months ago

There are alternate designs in this case, all with pros and cons. It's all about balancing the pros and cons. It's not about who is right or wrong. Inheriting config files sounds great to me, but it is easy to overlook the (unintended) consequences. For one, when ug no longer works as expected with this approach, then it becomes much harder to figure out why when two config files are involved in combination with the command-line options passed to the failing ug invocation.

Note that sift uses config files that are saved with an option and only saves a subset of options (I guess for the same reasons I've outlined.) But sift does have two configs, which in my opinion is unnecessary when saving a stand-alone config file. By contrast, I wanted the config file to be portable to mounted drives and to other user's drives and not make things more complicated. The sift --write-config option is supposed to be used for this but the config file won't be stand-alone (editing a sift config file is possible, but it's JSON and not expected). Sift also has --print-config to figure out what options are actually enabled when the two configs are read. Keeping things as simple as possible is better IMHO.

Yeah, I'd say that being explicit about it is the least the tool should do, but given how alien the whole --save-config concept is to me I'd hesitate to play the judge here. Perhaps there is user demographic that welcomes the current behavior?

Right. I totally lost you after this one. No need to be sarcastic or rude. A serious technical conversation doesn't benefit from this kind of attitude. My motto is to be helpful in a respectful way and not be confrontational on matters of opinion.

stepnem commented 9 months ago

There are alternate designs in this case, all with pros and cons. It's all about balancing the pros and cons. It's not about who is right or wrong. Inheriting config files sounds great to me, but it is easy to overlook the (unintended) consequences. For one, when ug no longer works as expected with this approach, then it becomes much harder to figure out why when two config files are involved in combination with the command-line options passed to the failing ug invocation.

Could you be more specific about your concerns? I've tried to explain what I believe is a simple (both to implement and reason about), transparent model (used by many other CLI tools, some of which I also mentioned) repeatedly, and fail to see how either the inheritance/extension or the include-based model would lead to issues you allude to (but unfortunately don't describe in more detail).

Keeping things as simple as possible is better IMHO.

Absolutely, but our notions of "simple" appear to differ considerably (there might also be some element of misunderstanding, see below).

Yeah, I'd say that being explicit about it is the least the tool should do, but given how alien the whole --save-config concept is to me I'd hesitate to play the judge here. Perhaps there is user demographic that welcomes the current behavior?

Right. I totally lost you after this one. No need to be sarcastic or rude.

I am very sorry if that's the impression you've got from that paragraph, but nothing could be farther from my intention.

I was being serious: I trust you put a lot of thought and deliberation into ugrep design and implementation, and I find it unlikely that you are the only one who considers the current behavior (of --save-config in particular) reasonable, it's just that I don't think I'm the best person to consult regarding the UX or further development of --save-config, as I'm not its user or proponent (I used "demographic" because I imagine there are different user groups with different preferences or backgrounds).

All I can do is try to explain my opinion and arguments, try to understand yours, and let happy users and advocates of --save-config (whose existence I have no reason to doubt) argue their position.

A serious technical conversation doesn't benefit from this kind of attitude. My motto is to be helpful in a respectful way and not be confrontational on matters of opinion.

Very much agreed (though I'd say confrontation doesn't hurt as long as it is rational, argument-based and civil; avoiding confrontation at all costs might make it difficult to express and learn different opinions in the first place).

Thank you for your patience!

genivia-inc commented 9 months ago

I've added the ability to the ugrep v4.3.4 release to specify config and config=FILE in config files to import a config file. See also the v4.3.4 release notes.

The manual is updated to explain the use of this approach as an alternative instead of --save-config. Also the explanation and use of --save-config is updated in the manual.

I believe this update offers the best of both worlds.

Rationale: changing the behavior of --config and the ug command is not something to be done without a very good reason. Others may use config files in ways that may break after an update, when the update reads two config files by default.

stepnem commented 9 months ago

I've added the ability to the ugrep v4.3.4 release to specify config and config=FILE in config files to import a config file. See also the v4.3.4 release notes.

That works for me, thank you!