dazinator / DotNet.Glob

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.
MIT License
363 stars 27 forks source link

File Enumeration #93

Closed mathgeniuszach closed 1 month ago

mathgeniuszach commented 1 month ago

How do you use this library to efficiently enumerate a list of files on the given machine (i.e., glob for files)? Looping over every single file on the drive is not a feasible solution.

dazinator commented 1 month ago

This project allows you to evaluate string's to match them against a glob. In terms of how you bring the strings for evaluation and how you want to iterate them - thats down to you. For memory efficiency - span based api's are available - but you still have to bring a string into memory for evaluation. If you want to use this to evaluate file name strings - you have to bring the file names into memory somehow. Either you enumerate the files or pull them in from a database, or a cache or whatever or iterate them one at a time and use a span etc. It sounds like you are looking to interrogate the file system directly and this approach of iterating the files etc is not faesible for your project. In that case you want to be using something native to the file system and its support for searching files to do that. This project is not for that. You want something more OS level or specific to the file system you are using that you search with. Sorry!

This project is more for evaluating glob's in memory in dotnet.

mathgeniuszach commented 1 month ago

You explicitly provide the ability to evaluate filepath strings against globs, not just regular strings. If it is just for regular strings, why do you have support for **?

It almost feels like there is an implicit expectation to users that they should enumerate all the files in a directory manually, then provide them to this library in memory; however, this is not realistic with sufficiently large filesets, where there are many subfiles and subdirectories to loop through, and may of them can be cut down more efficiently with a direct approach.

What is this project's main goal and usecase? To locate matching files, or simply just to match strings?

dazinator commented 1 month ago

To match strings using a glob syntax in dotnet is this libraries main purpose. If you want to go direct to a file system to perform some sort of search query against it without iterating files then this library (and dotnet in general) won't help you very much.

Is your file system on Windows? And if so, do you have the Windows search indexing service enabled? If so you can perform a sql like query for files that performs much better than iterating file on the client as it uses the index (according to chatgpt)


using System;
using System.Data.OleDb;

class Program
{
    static void Main()
    {
        string query = "SELECT System.ItemPathDisplay FROM SystemIndex WHERE System.ItemName LIKE '%filename%'";

        using (OleDbConnection connection = new OleDbConnection(@"Provider=Search.CollatorDSO;Extended Properties='Application=Windows';"))
        {
            connection.Open();

            using (OleDbCommand command = new OleDbCommand(query, connection))
            {
                using (OleDbDataReader reader = command.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        Console.WriteLine(reader["System.ItemPathDisplay"]);
                    }
                }
            }
        }
    }
}

However this is not strictly glob based but more sql like and will ofcourse only return files in the search index. If it's a Linux file system I guess you'd need to use shell tools like find etc. In either case I'm afraid this library isn't for you.

You asked a bit about why this library exists really and why it has support for . There are cases where its fine to evaluate strings in memory and you want to use glob matching and being part of the globbing spec is useful.

mathgeniuszach commented 1 month ago

Ah, I see. Those are good points! I was looking for a cross platform library that supports glob-based file location, which is typically what a "glob" library does in other languages (Python, Rust, C/C++, Javascript, etc.), but from what you're describing it seems that isn't the point of this library.

To be honest, I think a pure glob-based library to match strings actually doesn't exist in other languages, because people are comfortable with regex. In that respect, this is a great library and I wish you the best in developing it, even though it was not what I was looking for.

Also, may I recommend closing this instead as "not planned"? I think that would give a better idea of the purpose of this library. The goal isn't file enumeration, so it's not planned.