GNUAspell / aspell

http://aspell.net
GNU Lesser General Public License v2.1
236 stars 50 forks source link

Using end() instead of begin() + size() #606

Closed rishitc closed 3 years ago

rishitc commented 3 years ago

Updated functions that return pointer to the iterator to the one-after-the-last-element's position in the vector to reduce the number of pointer dereferencing operations and improve code clarity.

rishitc commented 3 years ago

Hi, The build error reported on CircleCI are for

sudo apt-get -y update

or for

sudo apt-get -y install make autopoint texinfo libtool intltool bzip2 gettext clang-3.5 clang-4.0 g++-multilib

They are not related to code test failures. I noticed this in all the builds that failed.

I only replaced the statements that get the pointer to the iterator to one-after-the-last-element using begin() + size() with end() in the vector class in the common folder.

Any inputs on why this would be causing code tests to fail, would be really helpful!

Thanks

kevina commented 3 years ago

I am fairly sure that dereferencing end() is undefined as it is past the end even though you don't actually read any memory.

kevina commented 3 years ago

It originally was just end() but was changed to back() + 1 (to avoid an out of bound error) then changed to begin() + size() (to avoid problems with empty vectors) see: https://github.com/GNUAspell/aspell-cvs/commit/d80c3af7c4b0f9feb5e20d7947c2c95ff0263162 and https://github.com/GNUAspell/aspell/commit/2f85a390d9264cc5c0c9d316650c3513dee6da2a.

Closing.

rishitc commented 3 years ago

Thanks, will take a look.

rishitc commented 3 years ago

Hi, Thank you so much for the references. After reading them, I gained a better understanding of how different alternatives were tried to implement the desired functionality.

I have one question though, why is begin() + size() any different than using end()?

Any inputs or feedback on why there is a difference between the two implementations, would be very helpful!

Thanks

kevina commented 3 years ago

I have one question though, why is begin() + size() any different than using end()?

The problem is when the iterator is dereferenced, from https://en.cppreference.com/w/cpp/container/vector/end": This element acts as a placeholder; attempting to access it results in undefined behavior".

kevina commented 3 years ago

Technically &*begin() is also undefined for an empty vector, but so far this has not caused a problem. The more correct solution would be to use this->empty() ? NULL : &*this->begin() instead of &*begin(), but that might have a slight performance impact.

C++11 provides a data() method to vector so the correct solution would be to just use that, but I want to keep C++98 compatibility for now.

I would accept a patch that redefines all relevant methods to use data() and then just not define data() if C++11 or better is detected via the __cplusplus macro.

rishitc commented 3 years ago

Hi, Thanks for the clarification. I was initially under the impression that we were trying to return an iterator from these methods (I had noticed the T * return type earlier, but I was not sure how it was working with the implementation then) After our discussion and trying out the Vector template class separately, I realized that we are returning a pointer to the one-after-the-last-element in the underlying container.

I'll start looking into the data() method in C++11 and how to implement the detection via the __cplusplus macro. The patch will only be an implementation change, right? or will the interface of the Vector template class have to change as well?

This is the first time, I'm working on a project so big. I have used GNU Aspell quite a few times and now trying to work on its source code, feels like a great way to contribute back to the project!

Shall I keep posting my updates with regards to the patch in this thread itself?

Thank you so much for your support.

rishitc commented 3 years ago

Hi, I've figured out how to use the data() method and the __cplusplus macro for this patch. My aim is to submit the patch by this weekend.

I have question, is it enough to verify that __cplusplus is not C++98 or C++03? As per this post [1], I feel that if I check if these standards are not used then I can confidently assume that the other standards specified will support data().

Any inputs or feedback on the implementations, would be very helpful!

Thanks

kevina commented 3 years ago

I have question, is it enough to verify that __cplusplus is not C++98 or C++03?

Just use _cplusplus >= 201103L.