JeffGarland / liaw2019-process

Repository for initial drafting of boost.process standards paper
MIT License
5 stars 4 forks source link

[Environment] Generic approach with process_env #63

Open klemens-morgenstern opened 3 years ago

klemens-morgenstern commented 3 years ago

I don't think we can rely on P1275R0 to be standardized (I think the global std::arguments will not be accepted) I think we should have a more generic approach to passing environments into a process, than relying on once class. We could add an initializer, e.g. process_env or have a full featured environment class.

The big need of environment functions arises on the side of the child process, i.e. when a child queries it's own environment. For example, searching for an executable that emulates the shell behaviour would be quite common for starting processes, but is quite different between platforms:

On windows, a certain file name is search in the PATH (or Path sometimes) variable with the addition of the extensions available in PATHEXT. On posix no such extensions exist, thus a function like std::vector<std::filesystem::path> std::this_process::env::find_executable(std::string & name). Similarly, one might want to get the default shell of the OS, which on posix is stored in SHELL while it's deducable through windir on windows. This could be handled by a std::filesystem::path std::this_process::env::shell() function.

It is to now however, that this information is queried before process start and thus usually queried from the father environment. Thus adding those specialized functions to an environment that is used to initialize a child process, should be considered of little to no priority. If another paper does provide such a class, we should support it.

If we discount specific values in an environment, what are the semantics ?

  1. It's a list of strings, in which every string contains at least one =
  2. The part before the first = is a unique key
  3. The part after the first = is the value, that is a : (posix) or ; (windows) seperated list.

With these three characteristics we can define what types are acceptable as an environment to be set by the process_env:

  1. range<convertible_to<string_view>>
  2. range<convertible_to<pair<string_view, string_view>>
  3. range<convertible_to<pair<string_view, range<string_view>>>

struct process_env {

    //string range
    template<ranges::range T>
        requires (
            requires(T t) {{*ranges::begin(t)} -> convertible_to<string_view>;}
            )
    process_env(T && t);

    //string pair range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> convertible_to<string_view>;}
            )
    process_env(T && t);

    //string pair vector range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> ranges::range;}
         && requires(T t) {{*ranges::begin(get<1>(*ranges::begin(t)))} -> ranges::range;}
            )
    process_env(T && t);

};

This this overloaded we can now use different containers, e.g. vector<string>, unordered_map<string, string> and unordered_map<string, vector<string>>. The environment class proposed in P1275R0 does fulfill the first constraint as well.

klemens-morgenstern commented 3 years ago

It might be a better idea to make process_env a template, which will allow more optimization for non-owning env, as might be given by a third party environment library.

klemens-morgenstern commented 3 years ago

Version for a string list:

template<typename R>
        requires (
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<   string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<  wstring_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to< u8string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<u16string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<u32string_view>;}
            )
struct process_env {
    process_env( const R & data);
    process_env( const R & data, const std::locale & loc);

    template<typename T>
    process_env( initializer_list<T> data);// -> process_env<initializer_list<T>> ;
    template<typename T>
    process_env( initializer_list<T> data, const std::locale & loc);// -> process_env<initializer_list<T>>;

};

template<typename T>
process_env( initializer_list<T> data) -> process_env<initializer_list<T>> ;
template<typename T>
process_env( initializer_list<T> data, const locale & loc) -> process_env<initializer_list<T>>;
klemens-morgenstern commented 3 years ago

The thing might work, but adding an initializer_list deduction guide for everthing is a bit intense. What we need is the capability to specialize for the native type, so there is overhead and maybe a initializer_list<string_view>. It does however make a log of sense to allow the env to be constructed from range<string>, map<string, string> and map<string, range<string>> . Furthermore, mixed forms to not need to be considered, beause the more complex can model the simple ones.

klemens-morgenstern commented 3 years ago

Alright, made it readable, will adapt it for #67. It is turning out big: convertible_to_string_view is just for readability

template<typename T>
concept convertible_to_string_view = 
        convertible_to<T,    string_view> ||  convertible_to<T,   wstring_view> || convertible_to<T,  u8string_view> || convertible_to<T, u16string_view> || convertible_to<T, u32string_view> ;

template<typename T>
    requires convertible_to_string_view<T> || 
        (requires (T t)
        {
            {tuple_size<T>{}} -> convertible_to<integral_constant<size_t,2u>>;
            {get<0>(t)} -> convertible_to_string_view;
            {get<1>(t)} -> convertible_to_string_view;
        } || 
        requires (T t)
        {
            {tuple_size<T>{}} -> convertible_to<integral_constant<size_t,2u>>;
            {get<0>(t)} -> convertible_to_string_view;
            {get<1>(t)} -> ranges::range;
            {*ranges::begin(get<1>(t))} -> convertible_to_string_view;
        })
class process_env {
public:
    process_env();

    process_env(initializer_list<T> r);
    process_env(initializer_list<T> r, const std::locale & loc);

    template<ranges::range Range>
        requires(convertible_to<ranges::range_value_t<Range>, T>)
    process_env(Range r);

    template<ranges::range Range>
        requires(convertible_to<ranges::range_value_t<Range>, T>)
    process_env(Range r, const std::locale & loc);

    template<input_iterator InputIt>
        requires(convertible_to<typename iterator_traits<InputIt>::value_type, T>)
    process_env(InputIt first, InputIt last);

    template<input_iterator InputIt>
        requires(convertible_to<typename iterator_traits<InputIt>::value_type, T>)
    process_env(InputIt first, InputIt last,  const std::locale & loc);

};

template<ranges::range Range>
process_env(Range r) -> process_env<ranges::range_value_t<Range>>;

template<ranges::range Range>
process_env(Range && r, const std::locale & loc)  -> process_env<ranges::range_value_t<Range>>;

template<input_iterator InputIt>
process_env(InputIt first, InputIt last) -> process_env<typename iterator_traits<InputIt>::value_type>;

template<input_iterator InputIt>
process_env(InputIt first, InputIt last, const std::locale & loc) -> process_env<typename iterator_traits<InputIt>::value_type>;

process_env(initializer_list<pair<   string_view,    string_view>> r) -> process_env<pair<   string_view,    string_view>>;
process_env(initializer_list<pair<  wstring_view,   wstring_view>> r) -> process_env<pair<  wstring_view,   wstring_view>>;
process_env(initializer_list<pair< u8string_view,  u8string_view>> r) -> process_env<pair< u8string_view,  u8string_view>>;
process_env(initializer_list<pair<u16string_view, u16string_view>> r) -> process_env<pair<u16string_view, u16string_view>>;
process_env(initializer_list<pair<u32string_view, u32string_view>> r) -> process_env<pair<u32string_view, u32string_view>>;

process_env(initializer_list<pair<   string_view, initializer_list<   string_view>>> r) -> process_env<pair<   string_view, initializer_list<   string_view>>>;
process_env(initializer_list<pair<  wstring_view, initializer_list<  wstring_view>>> r) -> process_env<pair<  wstring_view, initializer_list<  wstring_view>>>;
process_env(initializer_list<pair< u8string_view, initializer_list< u8string_view>>> r) -> process_env<pair< u8string_view, initializer_list< u8string_view>>>;
process_env(initializer_list<pair<u16string_view, initializer_list<u16string_view>>> r) -> process_env<pair<u16string_view, initializer_list<u16string_view>>>;
process_env(initializer_list<pair<u32string_view, initializer_list<u32string_view>>> r) -> process_env<pair<u32string_view, initializer_list<u32string_view>>>;
klemens-morgenstern commented 3 years ago

I got a rudimentary implementation of this process_env, which works, but I am starting to think it might be a stupid idea (probably because I spent the last three hours on it). All we really need would be to have the seperator of a list defined (i.e. : vs ;), so that a user can construct his own environment. That would be:

template<typename T>
concept convertible_to_string_view = 
        convertible_to<T,    string_view> ||  convertible_to<T,   wstring_view> || convertible_to<T,  u8string_view> || convertible_to<T, u16string_view> || convertible_to<T, u32string_view> ;

class process_env {
public:

    template<typename Char>
    constexpr static Char list_seperator = implementation-defined;

    template<convertible_to_string_view T>
    process_env(initializer_list<T> r);

    template<convertible_to_string_view T>
    process_env(initializer_list<T> r, const std::locale & loc);

    template<ranges::range Range>
        requires(convertible_to_string_view<ranges::range_value_t<Range>>)
    process_env(Range r);

    template<ranges::range Range>
        requires(convertible_to_string_view<ranges::range_value_t<Range>>)
    process_env(Range r, const std::locale & loc);

    template<input_iterator InputIt>
        requires(convertible_to_string_view<typename iterator_traits<InputIt>::value_type>)
    process_env(InputIt first, InputIt last);

    template<input_iterator InputIt>
        requires(convertible_to_string_view<typename iterator_traits<InputIt>::value_type>)
    process_env(InputIt first, InputIt last,  const std::locale & loc);

};

This is much simpler and should seamlessly work with a a third party environment library. Thoughts about that are highliy appreciated!

JeffGarland commented 3 years ago

apologies for being so far behind you on this. I think this is exactly the correct direction -- in particular the range based approach. Let me address a few points

klemens-morgenstern commented 3 years ago
JeffGarland commented 3 years ago
* This reflects the `locale` overloads in `std::filesystem::path`.

Ok. I haven't followed in detail the discussions about filesystem::path on the committee level, but it seems like it is able to do some magic w.r.t. this -- but maybe that's in a proposed replacement, will have to check. The good news is that since Elias and I have already briefed this a couple times we can get SG16 to tell us what they'd like us to do with this stuff to be compatible with the upcoming unicode support directions.

klemens-morgenstern commented 3 years ago

Right, whatever the current convention is works for me. We need to convert to wchar_t, i.e. UTF-16 on windows, I don't care how that's done.

marijanp commented 3 years ago

comment 723703925 is a good insight, and the creation of a process_env should be possible in that manner. But what if the user wants to manipulate the environment variables definitions computationally before creating a child-process? I suspect that most users will at some point create containers to combine/manipulate the environment variables contents. That results in string operations like splitting (O(N)) e.g. the PATH variables paths. That might happen only once. But joining them again to a string (O(N)) might occur more often, frustrating the user.

Another thing that comes in my mind is the question, what should happen if multiple definitions of the same environment variable are passed to the constructor?

klemens-morgenstern commented 3 years ago

If you manipulate the environment you will need to rebuild the environment for the child process anyhow. In linux it's a nullptr terminated char** list, on windows it's a string array, where the last element is an empty string. I.e. FOO=1\nBAR=2\n\n.

This means that you can get away without copies on linux, by just taking the pointer from a string_view, while you'll always copy a bunch on windows. The problem you talk about only occurs if the library accessing the current environment copies a lot. Passing everything in a string_views would allow preallocation though.

I did simplifiy the actually proposed version to just take a range of string_views and not the other stuff, so there would be no splitting going on unless explicitly requested by the users. That is unless he wants to modify the strings.

klemens-morgenstern commented 3 years ago

I would leave the handling of multiple keys up to the implementation. Could be an overwrite or a system_error I reckon.

marijanp commented 3 years ago

Reading all the comments I forgot what this issue was all about -> just the creation of a process env for the child...so you are right, the problem only occurs when the user accesses the environment of the current process and manipulates in some way.