This is a Project of The 42 Cursus Common Core in 1337 school that I worked on with my Teammate @Jeffy
The project consists of making an *almost* complete POSIX shell (Bash'ish) using barebone C (The only external library is readline
).
$ git clone --recursive git@github.com:YeGoRenji/minishell-1337.git && cd minishell-1337
$ make
$ ./minishell
[/tmp/lab/minishell-1337]
➤
Our Shell69 has the following features :
<, > and >>
|
$?
)echo, cd, pwd, export, unset and exit
<<
with signals&&
and ||
with parenthesis for prioritiesThe project follows this pipeline to make the code as organized as possible : \
Lexer
-> Parser
-> Expander
*-> Executor
(*) in case of precedence there is another
Expanding
that is done whileExecuting
.export TEST=42 && echo $TEST # Should output: 42
The Lexer (Tokenizer) 🔠 :
This is where the baby steps start. The main role of this step is :
- Identify Tokens : Example command is:
echo "Hello"$HOME && ls' -la' | > file cat
Using a linked list that has two pointers one for token next after space and after no space.
- Check for Lexing errors : Unclosed
'
or"
: Unknown character :The Parser 🧩 :
This part is where the tokens start having a structural meaning. The
Executor
's complexity is highly dependent on this part.The Recursive Descent Parser 🌲 🔄 :
The structure we used is a tree or more specifically its called Abstract Syntax Tree : This data structure is the one used for parsing programming languages, Its utility comes when Trying to define Syntactical errors. This is the example of the abstract syntax tree generated from the command above on Lexer. example : The command (executable + arguments) and the filenames are left as tokens so they are expanded later.
Another example :
echo "Salam" || ls -la && cat /etc/passwd | grep 'yego'
The Syntax analysis works as you go along parsing, for example finding a (
expects an expression (sub tree) then after that a )
Using recursion makes this process cleaner :
t_ast_cmd *parse_parenths(t_token **current)
{
t_ast_cmd *node;
node = NULL;
if (match(*current, (t_token_type[]){LPREN}, 1)) // match `(`
{
advance(current);
node = parse_cmd(current); // parse expression inside
if (!node || !match(*current, (t_token_type[]){RPREN}, 1)) // match `)`
return (free_ast(node), NULL);
advance(current);
}
node = subsh_node(node);
if (!node)
return (free_ast(node), NULL);
return (node);
}
This part becomes quite simple because of how things were setup on the last part (The Parser) Executing is just traversing (in-order Depth-first traversal) the AST. This is an animation that explains it all :
While executing whenever the executed node has a tokens list it will be expanded. Expanding is the action of :
$var
argv list
*
Now comes handy the fact that we still have the tokens from the Lexer, cause they have the information for whether to expand it or not.
In simple terms Expander takes a list of tokens and consumes them to a char **
args
that will be used depending on the node executed :
execve
syscall.As mentioned above the expanding is done while executing so that we have contextual variables change like this example :
export TEST=42 && echo $TEST # Should output: 42
Working on the minishell project has been an incredibly enriching experience for me. Throughout this journey, I've delved deep into the realms of system programming and shell scripting, honing my problem solving skills and understanding the intricacies of building a POSIX-compliant shell.