BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.24k stars 107 forks source link

explore traversal that yields directory entry twice #22

Closed windwardly closed 6 years ago

windwardly commented 7 years ago

This is about issue #18 and PR #19 . Aside: the more I dug in, the more I learned, and I'm grateful to all involved for the work on walkdir.

@mcharsley, I played around with contents_first, and for the most part, liked it.

However, it doesn't seem to work with filter_entry() or skip_current_directory() -- the yielded entries look to be skipping the remainder of the directory the skipped directory is in, as opposed to the skipped directory.

Additionally, I was looking for an option which also included the initial directory.

I see @michaelsproul proposed something which yields both edges, and I played around with a modified version of that and seemed to be able to do what I was looking for.

I also played around with writing an iterator adapter which also works okay, but didn't feel quite right, as it didn't work to implement WalkDir, so skip_current_directory was outside of that.

I'm torn between a simpler walkdir plus another crate for an adapter controlling when and how often a directory is listed, and an integrated version, making use of the stacks that are already there. I could see either way, either rolling back the contents_first commit or adding to it to list the directory twice and have it working with filtering.

mcharsley commented 7 years ago

I should be able to take a look over the next few days, and see if I can fix things. Can you give an exact example of the behaviour you expect so I can make sure I fix the right thing :-)

Mark

On Tue, May 30, 2017 at 7:49 PM, windwardly notifications@github.com wrote:

This is about issue #18 https://github.com/BurntSushi/walkdir/issues/18 and PR #19 https://github.com/BurntSushi/walkdir/pull/19 . Aside: the more I dug in, the more I learned, and I'm grateful to all involved for the work on walkdir.

@mcharsley https://github.com/mcharsley, I played around with contents_first, and for the most part, liked it.

However, it doesn't seem to work with filter_entry() or skip_current_directory() -- the yielded entries look to be skipping the remainder of the directory the skipped directory is in, as opposed to the skipped directory.

Additionally, I was looking for an option which also included the initial directory.

I see @michaelsproul https://github.com/michaelsproul proposed something which yields both edges, and I played around with a modified version https://github.com/windwardly/walkdir/commit/dbd02660fa72c62c307f98ce81a57bb21f4a1dd3 of that and seemed to be able to do what I was looking for.

I also played around with writing an iterator adapter https://github.com/windwardly/walkdir/commit/d84979913e9d520e8c44111086f8789070c5a420 which also works okay, but didn't feel quite right, as it didn't work to implement WalkDir, so skip_current_directory was outside of that.

I'm torn between a simpler walkdir plus another crate for an adapter controlling when and how often a directory is listed, and an integrated version, making use of the stacks that are already there. I could see either way, either rolling back the contents_first commit or adding to it to list the directory twice and have it working with filtering.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BurntSushi/walkdir/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/ALbIF9OtOVDlV-fdLKbpjb5g6l6qx1vqks5r_GTEgaJpZM4NqsoN .

windwardly commented 7 years ago

... an exact example of the behaviour you expect... ?

How's this line of thinking? For files, filter_entry filters only specific files, and for skip_current_dir filters the rest of the directory.

Since contents_first already yielded the contents of the directory, I would expect the handling of the directory to be the same as how files are handled: filter_entry would just ignore the directory entry, and skip_current_directory would skip the remainder of the directory.

With this view, skip_current_dir works and filter_entry currently doesn't.

For the filter_entry case, if filtering all directories, then all files would be listed.

I made a contents_first_filter branch with 1 failing and 1 passing test.

mcharsley commented 7 years ago

Your test looks reasonable. Not sure why you'd want to call skip_current_dir in a contents-first mode, mind - but your test describes what I'd expect if you did. :-)

I'll try and take a look at filter_entry some time this week.

Mark

On Wed, May 31, 2017 at 11:42 PM, windwardly notifications@github.com wrote:

... an exact example of the behaviour you expect... ?

How's this line of thinking? For files, filter_entry filters only specific files, and for skip_current_dir filters the rest of the directory.

Since contents_first already yielded the contents of the directory, I would expect the handling of the directory to be the same as how files are handled: filter_entry would just ignore the directory entry, and skip_current_directory would skip the remainder of the directory.

With this view, skip_current_dir works and filter_entry currently doesn't.

For the filter_entry case, if filtering all directories, then all files would be listed.

I made a contents_first_filter https://github.com/windwardly/walkdir/tree/contents_first_filter branch with 1 failing and 1 passing test https://github.com/windwardly/walkdir/commit/e85266aa286de5463a70663bd8fd5ba954b7d612 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BurntSushi/walkdir/issues/22#issuecomment-305338773, or mute the thread https://github.com/notifications/unsubscribe-auth/ALbIF5aXnOCL3IfS0O2Jp27N5CzNTRmuks5r_ezOgaJpZM4NqsoN .

windwardly commented 7 years ago

Thanks. I wasn't looking to call skip_current_dir as much as I was looking for a model which I could predict how things work, that the parts hang together, that it's been thought about, and that I can trust. This helps. I appreciate it.

BurntSushi commented 6 years ago

OK, so I finally had a chance to look at this, and I must confess, I don't see any particular problem with the interaction between contents_first and skip_current_dir or filter_entry. skip_current_dir seems to work exactly as I'd expect; as soon as you call it, walkdir skips descending and pops up the stack. In the case of filter_entry, the method really just doesn't make sense to call at all. The documentation (which is a bit clearer now perhaps than it was when this issue was filed) states that it won't descend into directories that fail the predicate, but this is exactly at odds with contents_first, which was explicitly requested by the caller.

With that said, I grant that there might be a use case for a traversal that yields the directory entry twice, but you should actually be able to layer that on top of a normal traversal yourself. It's definitely a bit more work, but I'm OK with niche use cases requiring more work.

I'm going to leave this open for now in case anyone wants to rigorously explore the design space here, but IMO, we're fine as is and can move forward.

windwardly commented 6 years ago

Thanks for checking into this. I learned a lot and will submit a new issue if I hit a wall.