dbrgn / tealdeer

A very fast implementation of tldr in Rust.
https://dbrgn.github.io/tealdeer/
Apache License 2.0
4.06k stars 123 forks source link

Only download pages matching the user's languages #345

Open niklasmohrin opened 10 months ago

niklasmohrin commented 10 months ago

Closes #335, although there are some follow ups in the comments that we should probably open new issues for.

Also, I opened https://github.com/tldr-pages/tldr/issues/11121 in the process.

niklasmohrin commented 10 months ago

The main motivation for me is honestly that it just feels like the right thing to do. Downloading ~40 languages when most people only need one or two is excessive and extra work which is not in the spirit of tealdeer as I perceive it. I have to admit, this is a tough argument to bring, given that I previously argued against git in favor of simplicity, but I think that the situation is different here since the language-specific zips are only marginally more complicated than the one big zip.

None of this solves the problems you pointed out though, so here is what I think about each of those:

  1. Agree, but could be solved very easily by adding a short "and check that you have the specified language installed"
  2. Agree, and this one is tougher. I think the solution would be to distinguish between "cache languages" (or "update languages") and "query languages". The former is the set of all languages mentioned by the user, i.e. cli arg + config file + env, while the latter is what we currently have in the languages variable in main. (This implies that we should not release this feature until we also figured out the config file part for this feature; doesn't sound like a problem to me)
  3. Not sure, I think we would have to measure to make any conclusion here. If this really is a problem, we could try to spawn some threads, but I doubt that any of this is noticable at all
  4. Not sure again, I think the added complexity is contained in Cache where we reproduce the layout as we currently have it. I agree that there might be other cases where missing languages would lead to unfortunate UX, but I think most of these would be caught by adding the "make sure you configured the languages" sentence to the "page not found" text, which would result in a worst case scenario of users opening issues with suggestions for more inclusions in the "cache languages" set I described in 2.

So all in all, I agree that there are problems, but I am still convinced that language specific downloads are the way to go and that I would have implemented it like this, if I were to write tealdeer from scratch today. What do you think?

dbrgn commented 6 months ago

Agree, but could be solved very easily by adding a short "and check that you have the specified language installed"

Such a message would need to be conditional on whether or not a custom language is enabled in the settings, or via CLI parameters. Otherwise, it might be quite confusing, if I search for a page that doesn't exist in the tldr repository, and get the message that I need to "install some language".

Agree, and this one is tougher. I think the solution would be to distinguish between "cache languages" (or "update languages") and "query languages".

That might be possible, but I think that would add a lot of additional complexity in the codebase that we don't need right now. It's harder to contribute to the code without fully understanding related concepts, and bugs are more likely to happen in certain uncommon combinations / configurations.

All in all, I still think we're better off in the long run if we don't add support for this.

I like to develop a client implementation that is fast, works for the majority of use cases, has a clean codebase and is easy to maintain. I fear that if we start adding more complexity to cover rare use cases, then the codebase becomes unnecessarily complex and the project will start to feel like a day job. And at that point, I'd lose the motivation to keep working on it 🙂

@niklasmohrin would it be OK for you if we closed this PR?

niklasmohrin commented 5 months ago

I don't agree. I think the complexity related to languages is mostly already there, we already have to parse language strings, find the correct directory name for a given language and respect the order. I agree that adding all needed changes to support the features from the issue and from what we discussed here will add complexity, but I think this is just incidental complexity that occurs because we are adding this after the fact. I think that if we sat down and maybe refactored the Cache interface we could come up with something that is harder to use wrong (than this PR) and still supports all desired features.

I suppose we could close the PR if you are unhappy with the trend the code is taking here, but I would not abandon the feature - I still think it is worthwhile to have and a behavior I would expect "good software" to have. After all, we have a lot of code for configuring the style of the output, something that I would consider less of a priority than language specific downloads if I were to write a tldr client from scratch today.

As far as vision goes, I agree that the project should stay maintainable and fast. Still, I see tealdeer as "the sane choice for tldr on the command line" and from that perspective I think that we should not refrain from implementing measures to reduce disk space and internet usage for minimalist users.

I don't know, for some reason I draw the line somewhere between this PR and git downloads :^)

EphemeralDev commented 5 months ago

On windows 11, installed tealdeer via scoop. I used the timeit command of nushell and ran tldr --update. It took 6min 2sec 768ms 912µs 800ns to update the cache. Other details, this was done over WiFi and the laptop in question does have a slower HDD.

dbrgn commented 5 months ago

I agree that adding all needed changes to support the features from the issue and from what we discussed here will add complexity, but I think this is just incidental complexity that occurs because we are adding this after the fact. I think that if we sat down and maybe refactored the Cache interface we could come up with something that is harder to use wrong (than this PR) and still supports all desired features.

To be honest, I doubt that. But if you could pull that off, I wouldn't be opposed to merging it. My main worry is really the added complexity and the increased maintenance burden.

How should we proceed?

niklasmohrin commented 4 months ago

Okay, so I will sit down and do a big refactor when I find the time for it. Let's keep this draft open for visibility

niklasmohrin commented 3 months ago

Okay, what do you think about the following:

// new_cache.rs

use std::{path::Path, time::Duration};

use crate::{cache::PageLookupResult, types::PlatformType};

use anyhow::Result;

pub struct Language<'a>(&'a str);

pub struct CacheConfig<'a> {
    pub pages_directory: &'a Path,
    pub custom_pages_directory: Option<&'a Path>,
    pub platforms: &'a [PlatformType],
    pub languages: &'a [Language<'a>],
}

/// The directory backing this cache is checked to be populated at construction.
pub struct Cache<'a> {
    config: CacheConfig<'a>,
}

impl<'a> Cache<'a> {
    pub fn open(config: CacheConfig<'a>) -> Result<Option<Self>> {
        todo!()
    }

    pub fn open_or_create(config: CacheConfig<'a>) -> Result<Self> {
        todo!()
    }

    pub fn age(&self) -> Result<Duration> {
        todo!()
    }

    pub fn list_pages(&self) -> impl IntoIterator<Item = String> {
        []
    }

    pub fn find_page(&self, command: &str) -> Option<PageLookupResult> {
        todo!()
    }

    pub fn clear(self) -> Result<()> {
        todo!()
    }

    pub fn update(&mut self, archive_url: &str) -> Result<()> {
        todo!()
    }
}

In main, we would use it like so:

let cache_config = new_cache::CacheConfig {
    pages_directory: &config.directories.cache_dir.path.join("tldr-main"),
    custom_pages_directory: config
        .directories
        .custom_pages_dir
        .as_ref()
        .map(PathWithSource::path),
    platforms: todo!(),
    languages: todo!(),
};

let cache = if args.update {
    new_cache::Cache::open_or_create(cache_config)?
} else {
    new_cache::Cache::open(cache_config)?.context("Cache not found, run `tldr --update`.")?
};

if args.clear_cache {
    cache.clear()?;

    clear_deprecated_cache(&config)?;

    return Ok(());
}

if should_update_cache(todo!(), &args, &config) {
    cache.update(todo!())?;

    clear_deprecated_cache(&config)?;
}

if args.list {
    for page in cache.list_pages() {
        println!("{}", page);
    }
    return Ok(());
}

ensure!(
    !args.command.is_empty(),
    "CLI parsing should have not allowed this!"
);

// Note: According to the TLDR client spec, page names must be transparently
// lowercased before lookup:
// https://github.com/tldr-pages/tldr/blob/main/CLIENT-SPECIFICATION.md#page-names
let command = args.command.join("-").to_lowercase();

let Some(lookup_result) = cache.find_page(&command) else {
    if !args.quiet {
        print_warning(
            enable_styles,
            &format!(
                "Page `{}` not found in cache.\n\
                     Try updating with `tldr --update`, or submit a pull request to:\n\
                     https://github.com/tldr-pages/tldr",
                &command
            ),
        );
    }

    bail!("Page not found")
};

if let Err(ref e) = print_page(&lookup_result, args.raw, enable_styles, args.pager, &config) {
    print_error(enable_styles, e);

    bail!(e)
}

Ok(())

One thing that I paid attention to is that the cache should not care about outputting message to the user - no enable_styles anywhere, only unexpected errors with Err or expected errors with Option. I think I had already written something like that at some point, but I think that we should in general refactor our user output with some kind of OutputManager or so. We could then manage what kind of messages are printed in which way in a separate module. Maybe we can even be fancy and do something similar to anyhow::Context so that we have something like

enum DomainError {
    CacheNotFound,
    PageNotFound(&str),
    // ...
}

Cache::open(cache_config)?.or_domain_error(&output_manager, DomainError::CacheNotFound)?

... but that is a topic for another time.

Another thing: I returned early from main here if for example the cache is cleared. I think it is reasonable, I don't see why tldr --clear-cache --update or tldr --clear-cache tar or tldr --list tar should be allowed. I think this makes more sense and it also makes the code easier.

Note that I already put it clear_deprecated_cache in reasonable places - the implementation is very simple:

fn clear_deprecated_cache(config: &Config) -> anyhow::Result<()> {
    let deprecated_config = new_cache::CacheConfig {
        pages_directory: &config.directories.cache_dir.path().join("tldr-master"),
        custom_pages_directory: None,
        platforms: &[],
        languages: &[],
    };

    let Some(cache) = new_cache::Cache::open(deprecated_config)? else {
        return Ok(());
    };

    cache.clear()?;

    Ok(())
}

So, what do you think? @dbrgn