fizwit / filesystem-reporting-tools

Tools to help system administors manage very large file systems. pwalk
GNU General Public License v2.0
22 stars 12 forks source link

Unicode in file names #13

Closed fizwit closed 6 years ago

fizwit commented 6 years ago

pwalk discards files which have Unicode UTH-8 characters in the file name. I have changed this filter to allow all unicode characters in file names to pass through to the output. WARNING if you downstream use of pwalk includes a database this could break-bulk loading of data.

The old functionality of removing Unicode was implemented to allow bulk loading into a database which only supports ASCII characters. Our database backends have improved and now full support UTH-8.

fizwit commented 6 years ago

Unicode characters in filenames are now supported. Only invalid characters for files are control characters and the Null.