ranges::to is significantly slower than direct construction

I expected these two implementations to have equivalent or relatively comparable performance, but Google Benchmark tells a different story when run over 476k words. Is this a perf bug or have I done something horribly wrong in making these two algorithms equivalent?

auto palindrome_range_common(std::vector<std::string> const& words)
{
    auto is_palindrome = [](auto const& word) {
        return not ranges::empty(word)
           and ranges::equal(word, word | views::reverse);
    };

    auto palindrome_excalim = [&is_palindrome](auto const& word) {
        return is_palindrome(word) ? word + '!' : word;
    };

    auto result = words
                | views::transform(palindrome_excalim)
                | views::common;

    return std::vector<std::string>{
        ranges::begin(result),
        ranges::end(result)
    };  
}

auto palindrome_range_to(std::vector<std::string> const& words)
{
    auto is_palindrome = [](auto const& word) {
        return not ranges::empty(word)
           and ranges::equal(word, word | views::reverse);
    };

    auto palindrome_excalim = [&is_palindrome](auto const& word) {
        return is_palindrome(word) ? word + '!' : word;
    };

    return words
         | views::transform(palindrome_excalim)
         | ranges::to<std::vector>; // range-v3 extension   
}

Running ./benchmark-palindromes-o2
Run on (2 X 2684.42 MHz CPU s)
CPU Caches:
  L1 Data 32K (x1)
  L1 Instruction 32K (x1)
  L2 Unified 256K (x1)
  L3 Unified 4096K (x1)
Load Average: 0.69, 0.63, 0.42
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
benchmark_solutions/handrolled     12443168 ns     12439882 ns         7113    # base
benchmark_solutions/algorithm      25840577 ns     25831195 ns         3255
benchmark_solutions/range_common   10816578 ns     10813486 ns         7474
benchmark_solutions/range_to       23025614 ns     23020478 ns         3303

Running ./benchmark-palindromes-o3
Run on (2 X 2684.42 MHz CPU s)
CPU Caches:
  L1 Data 32K (x1)
  L1 Instruction 32K (x1)
  L2 Unified 256K (x1)
  L3 Unified 4096K (x1)
Load Average: 0.05, 0.36, 0.47
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
benchmark_solutions/handrolled      9594576 ns      9591506 ns         8880    # base
benchmark_solutions/algorithm      21961494 ns     21955033 ns         3845
benchmark_solutions/range_common    9305111 ns      9303263 ns         9115
benchmark_solutions/range_to       21293927 ns     21286420 ns         3954

ericniebler / range-v3

ranges::to is significantly slower than direct construction #1337