Refactor compiler skeleton

Changes:

Replaced old-style #ifndef header guards with #pragma once
Wrapped nodes in AST namespace - it's in general good practice to do this in projects, and I think it'll help encourage students to do this without being too confusing for them.
Replaced Node* stored by nodes to NodePtr = std::unique_ptr<const Node>. This change incorporates a couple elements:
1. Immutability of AST: The AST is constructed by the parser, and all the information a node needs is available once it's constructed. making everything const reduces the chances of programmer error
2. Switch from manual memory management to RAII: Using new and delete manually is very C-like (/ pre C++11). Although it's all that students are taught in Y1, I think understanding what a std::unique_ptr is is relatively straightforward, and encourages students to use a more modern interface.
3. Aliasing node storage type: by defining NodePtr as an alias for the stored type, it not only saves on characters, but makes it much easier for students to switch this type out with something else (like raw pointers) if they find the use of std::unique_ptr too confusing.
TypeSpecifier is no longer a Node - this makes sense imo because just a type specifier is never enough information for you to produce assembly, meaning that EmitRISC(...) will always have an empty / throwing implementation. Also switched to using an enum class instead of a std::string for hopefully obvious reasons.
Removed branches_ from Node - see here for reasoning
Fixed memory leaks - see #21. I'll probably make a separate PR into main to fix this since it's minimally invasive and it's kinda unfair to assess students on not leaking memory if we give them a leaky base.
- Added valgrind to the docker env to make it easier for students to use it
- [ ] Add docs explaining how to check for memory leaks
- [ ] Explicitly mention memory management as part of the 10% of marks given for "nice code"
[nit] (only style change) Changed types from T & to T& - I think it's much nicer to have the type information all together rather than separated by whitespace (sorry, I couldn't resist)

Other considered changes:

Adding more intermediate abstract classes - in the future, students may need to get information from special types of nodes, for example, getting the name of an identifier. Adding a virtual method to their base node class would be bad practice, as would dynamic casting the member, it would be better for the relevant class to store the derived node type directly / a more specialised abstract class (i.e. Expression). Unfortunately, with the basic skeleton we provide, there's no really meaningful intermediate abstract classes we can really provide, for example, we could make Return store a pointer to an Expression instead of a general node, but currently an Expression and a Node are exactly the same (it would also make it more difficult for them to switch out the pointer implementation they're using). In general, I'm hoping that TypeSpecifier is enough to show them that it's fine for different Nodes to store things which aren't Nodes / it's fine for them to add more types to their parser.
Reducing copying using move semantics - when constructing things like Identifier, it would be best practice to use move semantics to avoid copying the string unnecessarily. However, move semantics / r-values is definitely something I'd classify as a more advanced topic, and since we're not really concerned about performance, I don't think there's enough upside to include it to justify the extra burden on the students.

To address your points:

#pragma once seems like a less-error prone concept, but StackOverflow is still divided - do you have some insights from industry? The companies I worked for still used classic header guards.

From my experience in industry, #pragma once is basically universally seen as a much better solution than the old school header guards. Although some of the comments in the thread you link claim that #pragma once has "unfixable bugs", I think that in practice you're much more likely to have issues with the old header guards, by accidentally re-using names, either by error or due to libraries you include. It's well-supported by basically all the big compilers you'd use.

Namespace names should be lowercase according to Google Style Guide that we are following.

sadge. will update

NodePtr usage is inconsistent - there are still many places that use Node*.

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

New implementation of type_specifier is a bit advanced, but I do like the changes. Hopefully we can assume that people implementing more types than int would be proficient enough to understand the used constructs.

Given that they have to compile enums and switch statements, I'm sure we can trust them to figure out what an enum class is 😉. It's just a (safer) enum wrapped in a namespace after all.

I think type_specifier should still be split into .hpp and .cpp as currently all the implementation is in .hpp.

Given that type_specifier itself is just an enum class, and that the helper functions provided are relatively simple, I think it's fine (maybe even better) to leave it all in .hpp (the functions are constexpr, which solves any possible ODR issues). That said, if you have a particularly strong opinion, I can split it into a .cpp.

emplace_back needs a mention in the docs along with move semantics in general as it's certainly an advanced topic.

Can also touch on it, currently using it to ensure that NodeList builds without issues for both smart and raw pointers.

In terms of less intrusive changes to include, what are you thinking? I could add (in order of least to most intrusive):

Fix memory leaks
Formatting changes
NodePtr (but set toconst Nodeor evenNode` if you think that's too much
New type_specifier (a bit more invasive but quite valuable imo)
namespace ast? (not sure about this one)

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

Oh, so that means that all of these have to stay as Node*?

If so, then I think this is going to be too confusing for students:

Is there any way around it? Like some conditional NodePtr hackery for flex/bison or extracting raw pointers from unique_ptr etc.?

Btw, I've noticed some places still have the previous type style (same goes for files in compiler_tests/ and debugging/, but I don't think it's that important to be consistent there):

That said, if you have a particularly strong opinion, I can split it into a .cpp.

I think it'd be more consistent with existing files to split constexpr std::string_view ToString(TypeSpecifier type)

In terms of less intrusive changes to include, what are you thinking?

I'd say fixing memory leaks is definitely the most important to be merged into main. The other changes are less critical and they affect .cpp and .hpp, which I believe we should avoid modifying at this point to avoid confusion (while obviously pushing them to main_2024 and merging after the submissions as I do agree they're very valuable additions!).

Update:

NodePtr-ed everything (at the cost of adding some std::moves, definitely going to have to explain move semantics well now 😅. Although the new interface is definitely nicer / more correct, since it's very clear where ownership of resources is being transferred - I avoided this in my initial implementation to avoid the std::moves, but they may as well learn things properly.
Made ast lowercase
Not moving type_specifier functions into a .cpp file, as this is not possible for constexpr functions. I could make it non-constexpr but that's definitely worse, although I could be very easily convinced that the addition of move semantics is enough of a difficulty spike for students, and constexpr would confuse them too much.

LangProc / langproc-cw

Refactor compiler skeleton #22