Closed tobias-loew closed 1 year ago
Could it be done without breaking public API maybe?
There are pros and cons wrt. to the API break:
pros:
cons:
IMHO pros 1. and 2. are desirable, and prohibiting the cons would be also. The results of get_line_start and get_current_line are usualy only used to mark the begin and end in the input. So, creating line_pos_iterators, when usually not needed violates the zero-abstraction principle.
So, one possible ways could be:
- get_line_start and get_current_line return underlying iterators which usually fit into registers
- get_line_start for forward-only iterators no longer needs to track the line-number, which is an overhead in the loop (though I haven't measured it)
If you want to work with underlying iterators and avoid overhead of line counting - you can obtain them with .base()
method and pass them to get_line_start/get_current_line.
- users get build errors and have to reconsider their code, which in the light of b023705 should be done anyway
That commit fixed a bug, I don't understand what you meant.
- get_line_start and get_current_line return underlying iterators which usually fit into registers
- get_line_start for forward-only iterators no longer needs to track the line-number, which is an overhead in the loop (though I haven't measured it)
If you want to work with underlying iterators and avoid overhead of line counting - you can obtain them with
.base()
method and pass them to get_line_start/get_current_line.Yes, but that would require the user to know, that get_line_start/get_current_line is suboptimal. I think it's better to either invoke the optimal version or give at least an advice, that there is a faster way to do it.
- users get build errors and have to reconsider their code, which in the light of b023705 should be done anyway
That commit fixed a bug, I don't understand what you meant.
To make it clear: the fix is of course correct and necessary. But, the code was broken for CRLF input since the beginning, and nobody complained (or at least nobody fixed it) before. So, either users didn't check the outcome or they wrote their own fix. Nevertheless, a change of the (apparently wrong) semantics after so many years should be made visible to the user.
Yes, but that would require the user to know, that get_line_start/get_current_line is suboptimal. I think it's better to either invoke the optimal version or give at least an advice, that there is a faster way to do it.
What about get_column? It would be still suboptimal even after this PR.
I think the code would be much clearer if you did:
// optimized get_line_start for bidirectional iterators
// optimized get_column for bidirectional iterators
line_pos_iterator<Iterator> get_line_start(line_pos_iterator<Iterator> lower_bound,
line_pos_iterator<Iterator> current)
{
// No need in line number counting because it will be the same
Iterator it = get_line_start(lower_bound.base(), current.base());
return line_pos_iterator<Iterator>(it, current.position());
}
line_pos_iterator<Iterator> get_line_end(line_pos_iterator<Iterator> current,
line_pos_iterator<Iterator> upper_bound)
{
// No need in line number counting because it will be the same
Iterator it = get_line_end(current.base(), upper_bound.base());
return line_pos_iterator<Iterator>(it, current.position());
}
std::size_t get_column(line_pos_iterator<Iterator> lower_bound,
line_pos_iterator<Iterator> current,
std::size_t tabs)
{
// No need in line number counting because it will be the same
Iterator it = get_column(lower_bound.base(), current.base(), tabs);
return line_pos_iterator<Iterator>(it, current.position());
}
Compiling with VS 2022 x64-release (default options) the generated code for
boost::iterator_range<pos_iterator_t> const range = get_current_line(input_begin, position, input_end);
std::string const current(range.begin(), range.end());
and
boost::iterator_range<std::string::const_iterator> const range = get_current_line(input_begin.base(), position.base(), input_end.base());
std::string const current(range.begin(), range.end());
are almost identical. The optimizer removes the line_pos_iterator creation/destruction in the upper code.
Thanks
line_pos_iterator now detecs underlying bidirectional iterator and uses iterator-tag dispatching to select appropriate implementations of get_line_start (for bidirectional-iterators: linear-complexity in line-length). Breaking change: all functions now return underlying iterators instead of line_pos_iterators. But, since they are only used to get the bounds for the current line, returning a full line_pos_iterator seems unnecessary.