TeamShadow / shadow

Reference compiler for the Shadow programming language.
http://shadow-language.org/
Apache License 2.0
12 stars 8 forks source link

Threads implementation #42

Closed claude-abounegm closed 7 years ago

claude-abounegm commented 8 years ago

This pull request contains the full implementation of threads with message passing.

Summary of newly added entities in shadow

Classes

  1. shadow:utility@ThreadSafeList: A Thread safe list, using Mutual-Exclusion. This should only be used by the Thread class.
  2. shadow:natives@Mutex: Mutual-Exclusion implementation in Shadow.
  3. shadow:natives@Pointer: A Shadow implementation to hold native pointers.
  4. shadow:natives@Signaler: A way to wait, sleep, notify and interrupt threads.
  5. shadow:standard@Mailbox: The implementation which threads uses to perform message passing.
  6. shadow:standard@Message: A wrapper to the message used by the ::Mailbox.
  7. shadow:standard@MessageQueue: A queue with a size constrained by a lower and upper bound, implemented with a circular array. This is used by the ::Mailbox class.
  8. shadow:standard@Thread: The main implementation of Threads.
  9. shadow:standard@TimeSpan: Contains methods to manipulate time.
  10. shadow:utility@DefaultEqualizer: The default comparison implementation for Lists. Checks if a class implements CanEqual<T>, if it does, it uses that; if it does not implement CanEqual, it uses the === reference comparison to compare two items.
  11. shadow:utility@ReadOnlyList: A wrapper list to a normal list which does not allow any modifications to be performed on it.

    Interfaces

  12. shadow:standard@CanRun
  13. shadow:utility@Equate<T>

    Singletons

  14. shadow:standard@CurrentThread: A way to provide easy communication with the current thread.
  15. shadow:standard@Time: A singleton which contains useful methods for calculating TimeSpans, and for getting the current Epoch Time as a TimeSpan.

    Exceptions

  16. shadow:natives@FreedResourceException
  17. shadow:natives@MutexException
  18. shadow:standard@InterruptedException
  19. shadow:standard@InvalidOperationException
  20. shadow:standard@NoDataException: Thrown when Mailbox is empty and items are being retrieved from it asynchronously.
  21. shadow:standard@ThreadException
  22. shadow:standard@WrongTypeException
  23. shadow:utility@FullListException

    Tests

  24. shadow:test@GenericInterfaceTest
  25. shadow:test@MessageQueueTest
  26. shadow:test@EqualityComparerTest
  27. shadow:test@InterruptTest
  28. shadow:test@MessagePassingTest
  29. shadow:test@MailboxTest
  30. shadow:test@SignalerTest
  31. shadow:test@ThreadIsolatedRunnerTest
  32. shadow:test@TLSTest
  33. shadow:test@ThreadSleepTest
  34. shadow:test@ThreadTest

Implementation Details

Thread

  1. Thread is a class through which we can spawn new threads. Threads can have custom names, or automatically numbered names in the form "Thread#{id}" where {id} is the Shadow-specific thread ID number.
  2. Each Thread is spawned using a Runner class, which implements the interface CanRun. The interface contains one method run() which does not take any parameters and returns void. The compiler supports the spawn keyword now, and has the following overloads:
  1. A Thread instance (the current running thread) can be retrieved using CurrentThread->instance.
  2. The main Thread is a dummy thread since it is not spawned from the Shadow code. It is initialized in the main() method in Main.ll and NoArguments.ll.
  3. When a new thread is spawned, a thread can wait for it to terminate using the join() method in Thread.join(). The join() method can also take also have a timeout, and can be specified using milliseconds or TimeSpan. TimeSpan provides nanoseconds accuracy.
  4. A Thread can never call join on itself, and doing so will throw an InvalidOperationException. CurrentThread does not have a join method. i.e. CurrentThread->instance.join() is not a valid operation. All other joins are valid.
  5. Each Thread gets a unique ID on spawn, and no ID is repeated throughout the lifetime of the application.
  6. There is a unique singleton instance for each thread. Thus, if five threads are spawned, each thread would have a different instance of the System singleton.
  7. Each newly spawned thread is in an isolated environment and cannot see other variables outside the current class it's in. The whole Runner class (aka. the class which implements the interface CanRun), is cloned on spawn, and since each thread has a different reference to Singletons, no thread can see variables modified by other threads.
  8. Shadow waits for all the threads to terminate before terminating the main thread. If there are threads that have not been joined on or threw exceptions that were not handled, the program prints out Uncaught Exception messages concerning each thread that terminated due to one, and waits for all the other threads to terminate.

ThreadException

  1. Exceptions are handled correctly by threads. If a thread terminates because of an exception, the exception does not propagate to the main thread. Instead, it is kept until join() is called on that thread. The join is responsible to throw the correct exception. The exception thrown is ThreadException which contains information about all the threads that got terminated as a result of that exception, and the flow that it followed. ThreadTest.shadow demonstrates this functionality. The ThreadException holds the full exception stack and the threads that got terminated, in the order they were terminated.

CurrentThread

  1. CurrentThread is a singleton which has useful methods that could be called on the current thread. It is a different identity from Thread. To retrieve the actual Thread instance, there is an instance() getter. The CurrentThread singleton allows us to performs useful operations such as sleep, interrupt, and allows enumeration through the current thread's children.

Pointer

  1. The Pointer class is used to allocate and store the address of a C-struct. For example, C-threads need a pthread_t struct to be initialized and passed to pthread_create to spawn a new thread. Since Shadow cannot manipulate pointers, we use the Pointer class and pass that natively to LLVM. Check next bullet point.
  2. Classes such as Thread and Mutex currently use C pthreads to perform their functions. To communicate between Shadow and C code, we have a new class called Pointer. This is a managed environment which stores the handle pointer under the hood as a long. The Pointer class is accompanied with an LLVM method %void* @__extractPointer(shadow.natives..Pointer*). This allows us to pass the Pointer reference from Shadow to native LLVM code, and extract the actual pointer from it. This technically casts back the long stored in the pointer to a void* which can then in turn be casted to desired type. For more details, check Thread.shadow, Thread.native.ll, Mutex.shadow, Mutex.native.ll, Signaler.shadow, and Signaler.native.ll.
  3. Pointer class is used to initialize memory on the heap, using calloc. To initialize a Pointer class, we have to pass how many bytes we would like to allocate. This should be retrieved from LLVM using getelementptr. There is a convention followed to retrieve this size. Check Note 3.4.

Signaler

  1. The Signaler class is a class which contains methods: notify(), notifyAll(), interrupt(Thread), waitForNotify(), waitForNotify(TimeSpan), waitForNotify(int), sleep(TimeSpan), and sleep(int). Unlike Java, Signaler is FIFO respects the order in which the threads went to sleep, and wakes them up in that same order.

TimeSpan

  1. The TimeSpan is a class which can perform basic operations on time. It has nano-seconds accuracy, and can add and subtract from other timespans. It can also split time in days, hours, minutes, seconds, milliseconds, microseconds and nanoseconds, with their correct corresponding portions.

Time

  1. Time is a helper for TimeSpan which can initialize a TimeSpan from days, hours, minutes, seconds, milliseconds, microseconds, and nanoseconds. It also contains a method to retrieve the current epoch time. It also has an InfiniteTimeout, which represents infinite waiting time for threads and other waiting methods.

Tests

Test cases to prove the above functionality have been added to tests/.

Others

Note 1: libwinpthread-1.dll is required on Windows when compiled using MinGW with threads. The approach I used to solve this problem is static linking. The Configuration file, at the assembler part gcc, now has an extra option -static which allows static linking. This solves the library problem and does not cause much bloating to the file. The original file that required the DLL was around 500kbs, while the statically linked file was around 600kbs. Note 2: The MinGW required to compile Shadow with Threads can be found here. Note 3: There are several coding conventions that I followed for my code to be more self-explanatory, and are as follows:

  1. native methods are named like any other functions, and their definitions are implemented in the .native.ll.
  2. Any method which follows the pattern {name}Native (ex. createNative(), unlockNative(), etc..) are Shadow implemented helper methods, which are to be called solely from LLVM code. For example, a mutex.unlock(); call from LLVM could be quite cumbersome to write by hand, since we need to extract the variable mutex from the current instance, get the method table from the Mutex class, find the correct index of the unlock() method and call it. Instead, a wrapper function is written in Shadow, and the compiler can do all the work for you: unlockMutexNative() => () { mutex.unlock(); } generates the correct code, and unlockMutexNative can be directly called in LLVM.
  3. If a method modifies, or returns a variable which does not belong to that class, and is equivalent to a static variable to Java, the method or getter/setter starts with static. For example, native get staticNextId() => (int) is a method implemented in LLVM which atomically retrieves the next thread id and returns it. This variable is declared in LLVM and not in Shadow, and can be seen by all Thread instances. Thus, it starts with the word static.
  4. Each class which requires a Pointer instance, should implement a native getter method: handleSize(). If multiple handles are required, it is advised to create a new class which does a specific task with that pointer. For example, the Mutex class has its own function and its functions are not integrated with other native methods such as Threads. The handleSize() is then used in the constructor of the Pointer, Pointer:create(this->handleSize);. This is intended for portability: if for example, pthread_t has a different size across different operating systems, we can implement the handleSize() code in Windows.ll, Linux.ll and Mac.ll without causing any problems.
  5. The braces style I used follow the PHP style. Method definitions have the braces on a new line, while all other expressions such as if, else, and while have the braces on the same line.

Note 4: Basic parser/lexer modifications have been done, and spawn has been added to the grammar, alongside with some other variables. The spawn keyword is still not working, however. Note 5: The compiler has been modified to ignore any private unused method which ends with Native.

TODO1: Change native method signatures to match 64-bit architecture. A long in a 32-bit architecture, does not have the same number of bytes as the 64-bit one, in C. This can be solved by using pointers, instead of ints, and casting those to ints after. TODO2: Update Timeout in Mailbox class to be more efficient.