djrieger / mjplusplus

A compiler for the MiniJava language
http://djrieger.github.io/mjplusplus/doc/doxygen/html/
6 stars 1 forks source link

Implement string table #12

Closed djrieger closed 9 years ago

djrieger commented 9 years ago
djrieger commented 9 years ago

Hey guys, does anyone have some suggestions on how to implement the string table? I was thinking of something like this:

class StringTable {
    private:
        std::unordered_set<std::string> strings;
    public:
        std::unordered_set<std::string>::iterator insert(std::string item);
};

unordered_set::insert() inserts a string and returns an iterator pointing to it or, if the string already exists in the set, returns an iterator pointing to this item. But we can't just replace Token::string_value with an std::unordered_set<std::string>::iterator right? What other type can we use to point to the elements in a set?

ratefuchs commented 9 years ago

Maybe we could return a pointer to the string or something like that. Then

djrieger commented 9 years ago

Yeah, regarding the pointer to a string contained in our set this sounds promising: http://stackoverflow.com/a/4070374 This sounds even more promising: http://stackoverflow.com/questions/5182122/pointers-to-elements-of-stl-containers. If I understand everything correctly, we could simply stick to a set of strings (no need for additional pointers) and keep a reference to std::unordered_set<std::string>::iteratorin our tokens since these iterators should stay valid as long as we don't remove elements from the string table.

Nidan commented 9 years ago
  • Check how to compare string reference equality in C++

by using the address-of operator (&): given T a, &b = a, &c = a; all of &a == &b, &a == &c and &b == &c will be true.

If I understand everything correctly, we could simply stick to a set of strings (no need for additional pointers) and keep a reference to std::unordered_set<std::string>::iterator in our tokens since these iterators should stay valid as long as we don't remove elements from the string table.

Almost. From http://www.cplusplus.com/reference/unordered_set/unordered_set/insert/ :

Iterator validity On most cases, all iterators in the container remain valid after the insertion. The only exception being when the growth of the container forces a rehash. In this case, all iterators in the container are invalidated.

References to elements in the unordered_set container remain valid in all cases, even after a rehash.

My current implementation in branch stringtable (which I forgot to push...) uses a mix of references and pointers. (References, since they can't be reseated nor NULL and because I have to touch less code, Pointers since references kill the copy assignment, which is needed for Token (no reseating).

Unless I find a safe and efficient way tomorrow, I'll need some help from the Lexer guys to integrate with operator/keyword lookup on Wednesday .