NativeInstruments / ni-media

NI Media is a C++ library for reading and writing audio streams.
MIT License
235 stars 34 forks source link

Switch to a path interface instead of using `std::string` for file paths #51

Open ni-nkozlowski opened 3 years ago

ni-nkozlowski commented 3 years ago

On Windows, conversions between std::string and path interfaces results in UTF-8 strings being converted to ANSI, resulting in non-ANSI characters being lost in the conversion process. This results in files not being loaded correctly because they'll fail our valid path checks.

marcrambo commented 3 years ago

AFAIK these utf8 conversion problems are all fixed already. We added some extensive tests to ensure files with unicode characters can be loaded. The folder ni-media/audiostream/test/test_files/user_files contains all sorts of files with unicode characters... and the unit tests loads these files as expected

'L1b_mp3-123456789äöü□'$'\204''□'$'\226''□'$'\234''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.mp3'
'L1m4a-123456789äöü□'$'\204''□'$'\226''□'$'\234''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'').m4a'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.aac.m4a'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.aiff'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.alac.m4a'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.flac'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.mp3'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.ogg'
'unicode123456789äöü□'$'\204''□'$'\226''□'$'\234''□'$'\237''!@#$%^&_(□'$'\203\225''□'$'\202''□k□'$'\202''□□'$'\201\214'')-ah7Hfds.wav'

Perhaps the call site is missing the proper locale initialization? See https://github.com/NativeInstruments/ni-media/blob/master/audiostream/test/ni/media/test_helper.cpp#L46. This is needed to properly interface with windows APIs (which use utf16 encoded wchar_t). Deep down in boost::iostreams, the sources / sinks will convert char <-> wchar_t and expect the proper conversion to take place.

I am not sure switching to std::filesystem::path will change anything to that behavior.

Maybe we should document this as a requirement on Windows? Or maybe we should use our own facet inside of ni-media, but I don't know if this even works, and if it's a good idea. Tbh I lost a bit track about the current state of utf8 in cpp, and what's best practice. Actually, I just noticed that https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16 is already deprecated in C++17... that didn't last long :)

ni-nkozlowski commented 3 years ago

CC @C4rsten

Perhaps the call site is missing the proper locale initialization?

Yes. This is how we fixed it on the user side. If we use our own facet, it's one less detail the user has to worry about, so I would be up for that. Though it seems like we would have to roll our own, as you pointed out std::codecvt has already been deprecated. 🤷

There's a PR open right now related to this issue as well where a path interface was added, but it also swaps out the backend from using boost::filesystem to std::filesystem.

marcrambo commented 3 years ago

Yes I agree, this is a hidden requirement and totally not obvious for clients. I would treat this issue separately from the C++ 17 std::filesystem support though.

I would suggest the following approach:

  1. We add custom utf8 <-> utf16 conversion functions inside of ni-media and handle all the path <-> string conversions ourselves -> Clients will not need to imbue boost::filesystem globally anymore. However they will still be responsible for providing utf8 encoded path strings. Internally we will continue using boost::filesystem::path. ni-media will still be c++14 and clients stuck on that compiler version also benefit from this fix.
  2. We switch from boost::filesystem::path to std::filesystem::path internally. This issue might block us https://github.com/boostorg/iostreams/pull/110 from replacing it entirely
  3. We add the std::filesystem::path to the public interfaces (What Frank already started in #53)
franklange commented 3 years ago

Heyhey, thanks for looking into this!

I would like to help execute this but I'm not yet 100% clear on how these steps would look like exactly.

For Step 1 you're envisioning something like this? (really not sure about the formatting here)

class aiff_file_source : public aiff_source<boost::iostreams::file_descriptor_source>
{
    using base_type = aiff_source<boost::iostreams::file_descriptor_source>;
#if BOOST_OS_WINDOWS
    using converter = std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>;
#endif

public:
    explicit aiff_file_source( const std::string& path )
#if BOOST_OS_WINDOWS
    : base_type( converter{}.from_bytes(path), BOOST_IOS::binary | BOOST_IOS::in )
    {}
#else
    : base_type( path, BOOST_IOS::binary | BOOST_IOS::in )
    {}
#endif
};

Step 2 is then using fs::path internally:

class aiff_file_source : public aiff_source<boost::iostreams::file_descriptor_source>
{
    using base_type = aiff_source<boost::iostreams::file_descriptor_source>;

public:
    explicit aiff_file_source( const std::string& path )
    : base_type( std::filesystem::u8path(path), BOOST_IOS::binary | BOOST_IOS::in )
    {}
};

and Step 3 is then just using fs::path as the input param type as done in the PR but without using u8path.

Is that somewhat along the lines of what you had in mind or am I completely going in the wrong direction?

C4rsten commented 3 years ago

As stated initially I would avoid string conversions entirely. Instead on Windows use std::wstring and deprecate the std::string interface to not break compatibility. I agree with the following steps: adding std::filesystem support eventually.