called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }

google-research / deduplicate-text-datasets

Apache License 2.0

1.09k stars 108 forks source link

Do you have a tmp subdirectory in the of the directory where you have the dataset you are "indexing" / using as a reference for checking for duplication?

If you look here in the script creating the suffix array, there is a relative path for tmp/out.table.bin:

https://github.com/google-research/deduplicate-text-datasets/blob/4e9888ac3f95dc4f6169867a04c4c19df02dafe3/scripts/make_suffix_array.py#L91-L95

I guess it is a bug, probably it should have been \tmp as in other places in the codebase. I also bumped into this problem yesterday, maybe it helps you or other people trying to use the tool.

google-research / deduplicate-text-datasets

called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" } #51