boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
383 stars 159 forks source link

line_pos_iterator now detecs underlying bidirectional iterator #757

Closed tobias-loew closed 1 year ago

tobias-loew commented 1 year ago

line_pos_iterator now detecs underlying bidirectional iterator and uses iterator-tag dispatching to select appropriate implementations of get_line_start (for bidirectional-iterators: linear-complexity in line-length). Breaking change: all functions now return underlying iterators instead of line_pos_iterators. But, since they are only used to get the bounds for the current line, returning a full line_pos_iterator seems unnecessary.

Kojoley commented 1 year ago

Could it be done without breaking public API maybe?

tobias-loew commented 1 year ago

There are pros and cons wrt. to the API break:

pros:

  1. get_line_start and get_current_line return underlying iterators which usually fit into registers
  2. get_line_start for forward-only iterators no longer needs to track the line-number, which is an overhead in the loop (though I haven't measured it)
  3. users get build errors and have to reconsider their code, which in the light of b0237057c804fa6a54871704f329a8b16bb97c42 should be done anyway

cons:

  1. user code gets broken
  2. get_line_start and get_current_line return less information than before and user have to write their own code, if they needed that information

IMHO pros 1. and 2. are desirable, and prohibiting the cons would be also. The results of get_line_start and get_current_line are usualy only used to mark the begin and end in the input. So, creating line_pos_iterators, when usually not needed violates the zero-abstraction principle.

So, one possible ways could be:

Kojoley commented 1 year ago
  1. get_line_start and get_current_line return underlying iterators which usually fit into registers
  2. get_line_start for forward-only iterators no longer needs to track the line-number, which is an overhead in the loop (though I haven't measured it)

If you want to work with underlying iterators and avoid overhead of line counting - you can obtain them with .base() method and pass them to get_line_start/get_current_line.

  1. users get build errors and have to reconsider their code, which in the light of b023705 should be done anyway

That commit fixed a bug, I don't understand what you meant.

tobias-loew commented 1 year ago
  1. get_line_start and get_current_line return underlying iterators which usually fit into registers
  2. get_line_start for forward-only iterators no longer needs to track the line-number, which is an overhead in the loop (though I haven't measured it)

If you want to work with underlying iterators and avoid overhead of line counting - you can obtain them with .base() method and pass them to get_line_start/get_current_line.

Yes, but that would require the user to know, that get_line_start/get_current_line is suboptimal. I think it's better to either invoke the optimal version or give at least an advice, that there is a faster way to do it.

  1. users get build errors and have to reconsider their code, which in the light of b023705 should be done anyway

That commit fixed a bug, I don't understand what you meant.

To make it clear: the fix is of course correct and necessary. But, the code was broken for CRLF input since the beginning, and nobody complained (or at least nobody fixed it) before. So, either users didn't check the outcome or they wrote their own fix. Nevertheless, a change of the (apparently wrong) semantics after so many years should be made visible to the user.

Kojoley commented 1 year ago

Yes, but that would require the user to know, that get_line_start/get_current_line is suboptimal. I think it's better to either invoke the optimal version or give at least an advice, that there is a faster way to do it.

What about get_column? It would be still suboptimal even after this PR.

I think the code would be much clearer if you did:

// optimized get_line_start for bidirectional iterators
// optimized get_column for bidirectional iterators

line_pos_iterator<Iterator> get_line_start(line_pos_iterator<Iterator> lower_bound,
                                           line_pos_iterator<Iterator> current)
{
    // No need in line number counting because it will be the same
    Iterator it = get_line_start(lower_bound.base(), current.base());
    return line_pos_iterator<Iterator>(it, current.position());
}

line_pos_iterator<Iterator> get_line_end(line_pos_iterator<Iterator> current,
                                         line_pos_iterator<Iterator> upper_bound)
{
    // No need in line number counting because it will be the same
    Iterator it = get_line_end(current.base(), upper_bound.base());
    return line_pos_iterator<Iterator>(it, current.position());
}

std::size_t get_column(line_pos_iterator<Iterator> lower_bound,
                       line_pos_iterator<Iterator> current,
                       std::size_t tabs)
{
    // No need in line number counting because it will be the same
    Iterator it = get_column(lower_bound.base(), current.base(), tabs);
    return line_pos_iterator<Iterator>(it, current.position());
}
tobias-loew commented 1 year ago

Compiling with VS 2022 x64-release (default options) the generated code for

            boost::iterator_range<pos_iterator_t> const range = get_current_line(input_begin, position, input_end);
            std::string const current(range.begin(), range.end());

and

            boost::iterator_range<std::string::const_iterator> const range = get_current_line(input_begin.base(), position.base(), input_end.base());
            std::string const current(range.begin(), range.end());

are almost identical. The optimizer removes the line_pos_iterator creation/destruction in the upper code.

tobias-loew commented 1 year ago

Thanks