Add two multi-threaded examples for Julia

I added a directory with two multithreaded examples for Julia:

one uses the low-level built-in Threads.@threads macro
the other one uses the FLoops.jl package, which makes writing multi-threaded reductions simpler and less error-prone (you don't have to play with indices arithmetic, and also using SIMD instructions is way simpler).

I do a dummy call to the kernel with a small number of steps to warm up.

Benchmarks on Myriad:

$ OMP_NUM_THREADS=1 ./run.sh 
Calculating PI using:
  1000000000 slices
  1 thread(s)
Obtained value of PI: 3.1415926535898455
Time taken: 0.8806970119476318 seconds
$ OMP_NUM_THREADS=18 ./run.sh 
Calculating PI using:
  1000000000 slices
  18 thread(s)
Obtained value of PI: 3.141592653589797
Time taken: 0.05102109909057617 seconds

$ OMP_NUM_THREADS=1 ./run_floops.sh
   # Wait for installation of packages.......
Calculating PI using:
  1000000000 slices
  1 thread(s)
Obtained value of PI: 3.1415926535898437
Time taken: 0.8845429420471191 seconds
$ OMP_NUM_THREADS=18 ./run_floops.sh 
   # Go and brew another cup of coffee........
Calculating PI using:
  1000000000 slices
  18 thread(s)
Obtained value of PI: 3.14159265358979
Time taken: 0.05775904655456543 seconds

For comparison, this is the benchmark of the Fortran+OpenMP example compiled with ifort:

$ OMP_NUM_THREADS=1 ./run.sh 
rm -f *.o pi
make -f Makefile.intel
make[1]: Entering directory `/lustre/home/cceamgi/repo/pi_examples/fortran_omp_pi_dir'
ifort -O2 -xHost -o pi -fopenmp pi.f90
make[1]: Leaving directory `/lustre/home/cceamgi/repo/pi_examples/fortran_omp_pi_dir'
Calculating PI using:
                        1000000000 slices
                                 1 OpenMP threads
Obtained value of PI: 3.1415926536
Time taken:                0.87537 seconds
$ OMP_NUM_THREADS=18 ./run.sh 
rm -f *.o pi
make -f Makefile.intel
make[1]: Entering directory `/lustre/home/cceamgi/repo/pi_examples/fortran_omp_pi_dir'
ifort -O2 -xHost -o pi -fopenmp pi.f90
make[1]: Leaving directory `/lustre/home/cceamgi/repo/pi_examples/fortran_omp_pi_dir'
Calculating PI using:
                        1000000000 slices
                                18 OpenMP threads
Obtained value of PI: 3.1415926536
Time taken:                0.05136 seconds

UCL-RITS / pi_examples

Add two multi-threaded examples for Julia #8