JaroslawWiosna / regionalizer

How to choose the best capital of a state and regions? Divider into regions, especially for administrative purposes
Mozilla Public License 2.0
2 stars 2 forks source link

remove heading and trailing spaces in tokens in `split` function #18

Closed JaroslawWiosna closed 6 years ago

JaroslawWiosna commented 6 years ago

split/DatabaseReader.cpp#L34 :

void split (std::string str, std::string splitBy, std::vector<std::string>& tokens) {
    tokens.push_back(str);
    std::size_t splitAt;
    std::size_t splitLen = splitBy.size();
    std::string frag;
    while (true) {
        frag = tokens.back();
    splitAt = frag.find(splitBy);
    if (splitAt == std::string::npos) {
            break;
        }
    tokens.back() = frag.substr(0, splitAt);
    tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));
    }
}

This is an outstanding algorithm, but let's have a look below:

[osboxes@osboxes build]$ gdb ./regionalizer
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./regionalizer...done.
(gdb) set args -list ../database/Poland-Cities010-Top10-mock.txt 
(gdb) show args
Argument list to give program being debugged when it is started is "-list ../database/Poland-Cities010-Top10-mock.txt ".
(gdb) break main.cpp:28
Breakpoint 1 at 0x22ce8: file /home/osboxes/github/regionalizer/main.cpp, line 28.
(gdb) run
Starting program: /home/osboxes/github/regionalizer/build/regionalizer -list ../database/Poland-Cities010-Top10-mock.txt 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
printFlags
-list ../database/Poland-Cities010-Top10-mock.txt
Great! ../database/Poland-Cities010-Top10-mock.txt is being processed

Breakpoint 1, main (argc=3, argv=0x7fffffffe698) at /home/osboxes/github/regionalizer/main.cpp:28
28          for (City i : vec) {
(gdb) print i
$1 = {name = "`\033|UUU\000\000ase/Poland-Cities010-Top10-mock.txt", area = 93824994593800, population = 93824992457192, latitude = <error: Cannot access memory at address 0x32>, longitude = <error: Cannot access memory at address 0xd>, 
  distanceToTheFarthest = 93824992465568}
(gdb) n
29              std::cout << i.getAllFields();
(gdb) print i
$2 = {name = "Warsaw ", area = 517, population = 1702139, latitude = " 52.23 ", longitude = " 21.012", distanceToTheFarthest = 0}
(gdb) 

I am talking about the very last line:

name = "Warsaw "

...and there is a trailing space in name field. It looks bad and may cause unpleasant issues in the future.

kermit10000000 commented 6 years ago

investigation started

kermit10000000 commented 6 years ago

just to be clear. you want the space to be removed only in field name ? because other fields also contains some additional spaces " 52.23 ", longitude = " 21.012". I have checked the code on some examples and i could not reproduce the problem with the split function itself. Looks like the problem is most likely somewhere else.

kermit10000000 commented 6 years ago

maybe getAllFields() was causing the problem but now dunno if it did not change over time

kermit10000000 commented 6 years ago

I am changing the results of the investigation after consulting better the problem. Now i know it was not about currently existing faults in our outsanding database which is no surprise. It is about the case if have some additional spaces in between "|" signs. But it can be easiely handled by performing function that will do on all the line taken from the database line.replace() replacing all " " with "" until there will be no more left. I will keep working now to prepare that sort of function soon.