Glob function is slow & inefficient

Hi, like the title says, the glob function (https://github.com/fsspec/adlfs/blob/main/adlfs/spec.py#L576) is slow and inefficient in specific use-cases (especially when using azure data lake gen 2's hierarchical namespace) because from the first * it stops matching and just requests everything after it.

In our use case, we've got data structured with a few columns which are often used to filter, then followed by year/month/day(/hour).

So if we have files like these for about 3 years, with the filter columns having a cardinality of 5 values (as an example), with 1 file per hour: some/common/prefix/filtercol1=somevalue1/filtercol2=somevalue2/year=2023/month=01/day=12/hour=01/0.parquet and then want to query all files from a specific day and filtercol2 value, but we don't care about the filtercol1 value like so: some/common/prefix/filtercol1=*/filtercol2=somevalue2/year=2023/month=01/day=12/hour=*/*.parquet We would like to have to check 5 1 1 1 24 = 120 files

But instead, the implementation of the glob function checks 5 5 3 365 24 = 657000 files (as far as we understand)

We've got a working example that does nothing of the fancy stuff of the current _glob function, but saves us about 50 euros per day (difference seen in azure cost management) just by not listing all the files all the time: https://github.com/mh-data-science/adlfs/blob/master/adlfs/spec.py#L689, not to mention the difference in speed.

Known issue: we don't support * this way (i.e. is only valid between two / characters, we're traversing the actual folders), and probably not a lot of the kwargs stuff or other fancy things either.

So questions:

Are we correct in our understanding of the current _glob function?
If yes, what do we need to do to get this conceptual improvement into the main branch?
Feedback on our glob code and how to add ** support always welcome, especially if a pull request is expected

fsspec / adlfs

Glob function is slow & inefficient #388