Coarray enabling - Githubissues

szaghi commented 8 years ago

Hi all,

I have found 1 hour to play with CAF (note that FoBiS now supports a new compiler OpenCoarrays-GNU gfortran). I faced with some problems and reading the work of @rouson I think that the current integrand ADT API must be slightly modified, but I appreciate your opinions.

I think that the best way (for reasons that I will explain later) is to let concrete extensions of integrand to embedded CAF members. However, it seems that a valid extension of an abstract type is allowed to contain a coarray only if the abstract parent does too. For example you can see the @rouson _coobject and its concrete extensiin _globalfield implemented for his Burgers CAF test. As a consequence, I propose to change our integrand with this:

type, abstract :: integrand
#ifdef CAF
class(*), allocatable :: dummy_to_allow_extension[:]
#endif
  !< Abstract type for building FOODIE ODE integrators.
  contains
    ! public deferred procedures that concrete integrand-field must implement
    procedure(time_derivative),      pass(self), deferred, public :: t !< Time derivative, residuals.
    ! operators
    ....
endtype integrand

If I understand right, this should allow a concrete extension of the type

type, extends(integrand) :: integrand_concrete
#ifdef CAF
  real, allocatable :: U(:)[:]
#else
  real, allocatable :: U(:)
#endif
  contains
    ...
endtype integrand_concrete

Why embedd Coarrays?

Because solving PDE on multi-block split domain involves global comunication of boundaries values for example. Consequently, when FOODIE solvers invoke %t() a global comunication is necessary.

What do you think?

rouson commented 8 years ago

This looks good to me. It's the first time I've seen an unlimited polymorphic coarray. Nice idea. Given that it's just a dummy, there is no reason to choose a specific type.

szaghi commented 8 years ago

@rouson

Today I found an hour to investigate the CAF-enabled Euler 1D test. I fail, maybe for either my inexperience or the few time devoted to.

I tried different approaches:

embedding coarray into euler_1D type

type, extendes(integrand) :: euler_1D
  ...
    real(R_P), allocatable :: U(:,:)[:]  
  ...
endtype

wrapping local euler_1D type into a global euler_1D_caf type

type :: euler_1D_caf
  ...
    type(euler_1D) :: local[*]  
  ...
endtype_caf

non OO CAF object: use directly into the main program test

program euler_1D_caf
  ...
    type(euler_1D) :: domain[*]  
  ...
endprogram

In all the approaches exercised, the compiler fails. I am using GNU gfortran 6.0 trunk (compiled by means of your script) with mpich v3.1.4 as back-end and OpenCoarrays 1.1.0. The compiler sometimes fails with an obscure ...Please submit a full bug report, sometimes with a more clear reference to Error: Sorry, coindexed coarray at (1) with allocatable component is not yet supported.

I think that my relevant problems rely on the fact that euler_1D type has many allocatable components and from OpenCoarrays status I notice that Derived-type coarrays with allocatable/pointer components are not yet handled properly related to the GNU gfortran support.

I will upload my failing codes soon, but I am now wondering if it is better to try Non-OCA CAF Compiler such as the Intel one in order to have quickly a CAF test. What do you think about?

rouson commented 8 years ago

Hi Stefano,

To be sure, there are applications in which the added flexibility of derived type coarrays with allocatable components is worth the cost, but it’s very important for users to at least understand that there is a cost. That understanding might inspire them to think of an alternative design. If I understand correctly, such was the case with the task-based parallelism approach developed by Michael Siehl (see https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/595365). He uses gfortran/OpenCoarrays to support a Multiple Program Multiple Data (MDMD) execution style that contrasts with the more typical Single Program Multiple Data (SPMD) style of most CAF codes. If I recall correctly, his problem is one that most naturally maps onto a design that employs derived type coarrays with allocatable components, but he came up with an alternative design to work around the lack of support for that feature and I think he might have benefited in process. I hope I’m characterizing his situation and solution correctly. I’m cc’ing him to confirm.

Although OpenCoarrays implements a substantial portion of the Fortran 2008 coarray features and even supports some Fortran 2015 coarray features, one missing Fortran 2008 piece is support for allocatable components in derived-type coarrays. That is one piece that I’m hoping will ultimately be developed under contract with a commercial compiler vendor.

One reason I’m comfortable leaving that feature unsupported for now is the aforementioned performance issue. Coarray Fortran (CAF) is based on a symmetric memory model, wherein each image allocates a coarray of a local shape that conforms with the shape of the coarrays on remote images. When the coarray is of intrinsic type, the compiler can compute the address of remote coarray elements on other images and get/put those elements without coordinating with the relevant remote image. This makes efficient use of CAF’s one-sided memory model, which in turn can make efficient use of hardware support for one-sided communication such as Remote Direct Memory Access (RDMA) hardware on Infiniband interconnects, which in turn can offer performance advantages relative to two-sided send/get semantics.

By contrast, when the coarray is of derived type, more inter-image coordination is required. With allocatable components of derived type coarrays, each image can allocate components of a different size and shape. In such a case, the image that gets/puts the data will have to query the remote image for the appropriate memory address before it can get/put data to/from the remote image. That extra coordination hurts performance. Therefore, derived type coarrays with allocatable components should be used sparingly.

Damian 510-600-2992 (mobile)

On Oct 27, 2015, at 8:17 AM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson Today I found an hour to investigate the CAF-enabled Euler 1D test. I fail, maybe for either my inexperience or the few time devoted to.

I tried different approaches:

embedding coarray into euler_1D type

type, extendes(integrand) :: euler_1D ... real(R_P), allocatable :: U(:,:)[:]
... endtype wrapping local euler_1D type into a global euler_1D_caf type

type :: euler_1D_caf ... type(euler_1D) :: local[*]
... endtype_caf non OO CAF object: use directly into the main program test

program euler_1D_caf ... type(euler_1D) :: domain[*]
... endprogram In all the approaches exercised, the compiler fails. I am using GNU gfortran 6.0 trunk (compiled by means of your script) with mpich v3.1.4 as back-end and OpenCoarrays 1.1.0. The compiler sometimes fails with an obscure ...Please submit a full bug report, sometimes with a more clear reference to Error: Sorry, coindexed coarray at (1) with allocatable component is not yet supported.

I think that my relevant problems rely on the fact that euler_1D type has many allocatable components and from OpenCoarrays status https://github.com/sourceryinstitute/opencoarrays/blob/master/STATUS.md#compiler-issues- I notice that Derived-type coarrays with allocatable/pointer components are not yet handled properly related to the GNU gfortran support.

I will upload my failing codes soon, but I am now wondering if it is better to try Non-OCA CAF Compiler such as the Intel one in order to have quickly a CAF test. What do you think about?

— Reply to this email directly or view it on GitHub https://github.com/Fortran-FOSS-Programmers/FOODIE/issues/26#issuecomment-151537320.

szaghi commented 8 years ago

@rouson

Hi Damian, thank you very much for these thoughts!

... such was the case with the task-based parallelism approach developed by Michael Siehl (see https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/595365). He uses gfortran/OpenCoarrays to support a Multiple Program Multiple Data (MDMD) execution style that contrasts with the more typical Single Program Multiple Data (SPMD) style of most CAF codes...

Your are right, derived type CAFs can allow MDMD model, but they must be handle carefully. I will study Michael Siehl's work.

... One reason I’m comfortable leaving that feature unsupported for now is the aforementioned performance issue. Coarray Fortran (CAF) is based on a symmetric memory model, wherein each image allocates a coarray of a local shape that conforms with the shape of the coarrays on remote images. When the coarray is of intrinsic type, the compiler can compute the address of remote coarray elements on other images and get/put those elements without coordinating with the relevant remote image... By contrast, when the coarray is of derived type, more inter-image coordination is required. With allocatable components of derived type coarrays, each image can allocate components of a different size and shape... That extra coordination hurts performance. Therefore, derived type coarrays with allocatable components should be used sparingly.

I agree on all and I am conscious that CAF model behaves well only on SPMD approach, while non symmetric CAFs could hint on performance issues. However, the possibility to have allocatable components into a derived type is one of the most powerful feature that modern Fortran provides, I cannot live without it :-)

I have played with Intel Fortran CAF (my Intel compiler is v15.0.3). The code is well compiled and run as expected with 1 image, but with 2 or more images I obtain a segfault when I try to access to an allocatable component of the derived type CAF. I think that also Intel Compiler has some issues with derived type CAF embedding allocatable members. Note that my code is actually symmetric: all images allocate their own embedded-allocatables with the same size. Nevertheless, I think that the coordination issue you mentioned is still a matter.

I will study the references you send me.

See you soon.

rouson commented 8 years ago

Damian 510-600-2992 (mobile)

On Oct 29, 2015, at 1:24 AM, Stefano Zaghi notifications@github.com wrote: I agree on all and I am conscious that CAF model behaves well only on SPMD approach, while non symmetric CAFs could hint on performance issues. However, the possibility to have allocatable components into a derived type is one of the most powerful feature that modern Fortran provides, I cannot live without it :-)

Well, then always keep in mind that both gfortran and OpenCoarrays re open-source. Most gfortran developers are actually working as volunteers motivated by their own project needs. (Just FYI, on rare occasions, I’ve funded some gfortran work in small amounts, but probably nowhere near the scale that would be required to add this feature. Either way, keep in mind the possibility of contribution development time.

D

szaghi commented 8 years ago

@rouson

Well, then always keep in mind that both gfortran and OpenCoarrays re open-source. Most gfortran developers are actually working as volunteers motivated by their own project needs. (Just FYI, on rare occasions, I’ve funded some gfortran work in small amounts, but probably nowhere near the scale that would be required to add this feature. Either way, keep in mind the possibility of contribution development time.

I am conscious of this. Indeed, I would like to say to you you are great for supporting Gfortran developers! Unluckily, I have not the skills to directly help them otherwise I would like to do. This is the reason (mainly an Ethic one) because I publish the most part of my work as FOSS: the GNU community gives me a lot, thus it is right that I give them all things I am able to give. Unluckily, improve gfortran is out of my possibilities :-(

See you soon.

szaghi commented 8 years ago

@rouson

I have uploaded my last CAF test, see it here. Now it compiles and works well when executed with 1 image. However, when the images used are 2 or more I obtain a Segmentation fault.

I use OpenCoarrays 1.1.0 with MPICH 3.1.4 with GNU gfortran 5.2.0 (but I also tried with 6.0.0 experimental trunk having the same failing results).

The failing output is:

→ cafrun -np 2 build/tests/parallel/euler-1D-caf/euler-1D-caf --verbose --steps 20 -r
001> image 1 of 2
001> Number of total cells: 100
001> Number of time steps: 20
001> Save final results: T
001> Save plots of results: F
001> Save time serie of results: F
001> Left BC: TRA
001> Right BC: CON-02
002> image 2 of 2
002> Number of total cells: 100
002> Number of time steps: 20
002> Save final results: T
002> Save plots of results: F
002> Save time serie of results: F
002> Left BC: CON-01
002> Right BC: TRA
001> Space resolution: 0.100000000000000E-001
001> X(1) X(N): 0.500000000000000E-002 0.495000000000000E+000
002> Space resolution: 0.100000000000000E-001
001> Density value: +0.100000000000000E+001 +0.100000000000000E+001
002> X(1) X(N): 0.505000000000000E+000 0.995000000000000E+000
002> Density value: +0.125000000000000E+000 +0.125000000000000E+000

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 52538 RUNNING AT zaghi
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

remarks

This version of the test, does not use derived type CAFs. The only CAFs present are of intrinsic type.

Do you have any suggestions?

rouson commented 8 years ago

Please reduce it to the minim number of entities required to reproduce the problem. This is really important. When submitting problem reports, I make every effort to squeeze out every detail that is unnecessary for reproducing the problem. That includes eliminating kind parameters, dummy argument attributes (e.g., intent), only clause s, etc. I then stare really hard at the code and try to convince myself that each and every entity that appears is an absolute requirement to reproduce the problem. In fact, as a very final step (and only as a final step), I often even delete "implicit none" if it's not needed to reproduce the problem. Only then do I submit it for others to review.

The above approach shines a tightly focused laser on the problem and is generally much more likely to elicit a response and a bug fix if it's a compiler bug or a suggested rewrite if it's a coding error.

I'm never really happy unless I end up with something in the neighborhood of 20-30 lines or fewer.

The above approach has the additional benefit that I sometimes find the problem along the way and fix the code instead of submitting a bug report.

In parallel programming, as I'm sure you know, one important thing to consider is whether you have inserted the necessary coordination between images usually in the form of "sync all" or "sync images."). You can access remote data unless you're certain the data is there. You can never be certain without some form of implicit synchronization (e.g., allocation of s coarray) or explicit synchronization (e.g., the aforementioned sync statements).

D

Damian

Sent from my iPhone

On Oct 29, 2015, at 9:04 AM, Stefano Zaghi notifications@github.com wrote:

@rouson

I have uploaded my last CAF test, see it here. Now it compiles and works well when executed with 1 image. However, when the images used are 2 or more I obtain a Segmentation fault.

I use OpenCoarrays 1.1.0 with MPICH 3.1.4 with GNU gfortran 5.2.0 (but I also tried with 6.0.0 experimental trunk having the same failing results).

The failing output is:

→ cafrun -np 2 build/tests/parallel/euler-1D-caf/euler-1D-caf --verbose --steps 20 -r 001> image 1 of 2 001> Number of total cells: 100 001> Number of time steps: 20 001> Save final results: T 001> Save plots of results: F 001> Save time serie of results: F 001> Left BC: TRA 001> Right BC: CON-02 002> image 2 of 2 002> Number of total cells: 100 002> Number of time steps: 20 002> Save final results: T 002> Save plots of results: F 002> Save time serie of results: F 002> Left BC: CON-01 002> Right BC: TRA 001> Space resolution: 0.100000000000000E-001 001> X(1) X(N): 0.500000000000000E-002 0.495000000000000E+000 002> Space resolution: 0.100000000000000E-001 001> Density value: +0.100000000000000E+001 +0.100000000000000E+001 002> X(1) X(N): 0.505000000000000E+000 0.995000000000000E+000 002> Density value: +0.125000000000000E+000 +0.125000000000000E+000

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 52538 RUNNING AT zaghi = EXIT CODE: 139 = CLEANING UP REMAINING PROCESSES

= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions remarks

This version of the test, does not use derived type CAFs. The only CAFs present are of intrinsic type.

Do you have any suggestions?

— Reply to this email directly or view it on GitHub.

szaghi commented 8 years ago

@rouson

You are right, but in this case I think that the minimum working example will be not so minimal. I think that at least the foodie module with its abstraction cannot be trimmed out.

I will try to minimize the test.

What is surprising is that the addiction of the CAF stuff is minimal with respect the serial code. I use an explicit synchronization of the images as you suggested, and I completely avoid CAFs of derived type. It is worth noting that similar errors occur with Intel Compiler (without OpenCoarrays).

I will let you know about my progress.

rouson commented 8 years ago

I have encountered a few rare cases where I couldn't reduce the code, but only a very few over the more than 100 bug reports I've submitted over the past decade. For example, is everything in the module needed? Presumably not if you're using "only" clauses.

I quoted often start with many hundreds of lines and end up with fewer than 50, but it takes a really extreme form of guerrilla that slashes and burns everything in sight and takes no prisoners. :O

You'll likely be very surprised how far you can get.

I can help with the process of you like, but I'll only do it if we connect interactively via Skype or something similar during the process. That would be most efficient and save a lot of time.

D

Sent from my iPhone

On Oct 29, 2015, at 9:29 AM, Stefano Zaghi notifications@github.com wrote:

@rouson

You are right, but in this case I think that the minimum working example will be not so minimal. I think that at least the foodie module with its abstraction cannot be trimmed out.

I will try to minimize the test.

What is surprising is that the addiction of the CAF stuff is minimal with respect the serial code. I use an explicit synchronization of the images as you suggested, and I completely avoid CAFs of derived type. It is worth noting that similar errors occur with Intel Compiler (without OpenCoarrays).

I will let you know about my progress.

— Reply to this email directly or view it on GitHub.

szaghi commented 8 years ago

@rouson

You are too kind. I will try to minimize the test tomorrow in my lunch break. In case of fire you will hear my help me :-)

P.S. I think that the test-issue is related to my inexperience with CAF. There should be something very trivial that I have missed... because I am almost sure to have not used any cutting-edge or border-line CAF features.

szaghi commented 8 years ago

@rouson

I found the bug! The bug is me! Obviously...

Due to my inexperience I go into one of the issues that you mentioned... I forget to explicit allocate one of the CAF, because I was mislead by the elegance of Fortran syntax for automatic reallocation of lhs (I love it, I abuse it...). In this case my lhs was a CAF not explicitly allocated, thus it is never automatically allocate by simply assignment!

Tomorrow I hope to have a CAF test working.

Thank you very much!

rouson commented 8 years ago

On Oct 29, 2015, at 10:23 AM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson I found the bug! The bug is me! Obviously...

Due to my inexperience I go into one of the issues that you mentioned... I forget to explicit allocate one of the CAF, because I was mislead by the elegance of Fortran syntax for automatic reallocation of lhs (I love it, I abuse it...). In this case my lhs was a CAF not explicitly allocated, thus it is never automatically allocate by simply assignment!

Bravo!

In case it’s a coarray, be aware that no automatic reallocations happen because that would require automatic synchronization, which could be a performance bottleneck. I really like the decisions the committee made to keep performance always the top priority.

D

rouson commented 8 years ago

To be sure, there are applications in which the added flexibility of derived type coarrays with allocatable components is >worth the cost, but it’s very important for users to at least understand that there is a cost. That understanding might >inspire them to think of an alternative design. If I understand correctly, such was the case with the task-based parallelism approach developed by Michael Siehl (see https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/595365). He uses gfortran/OpenCoarrays to support a Multiple Program Multiple Data (MDMD) execution style that contrasts with the more typical Single Program Multiple Data (SPMD) style of most CAF >codes. If I recall correctly, his problem is one that most naturally maps onto a design that employs derived type coarrays >with allocatable components, but he came up with an alternative design to work around the lack of support for that >feature and I think he might have benefited in process. I hope I’m characterizing his situation and solution correctly. I’m >cc’ing him to confirm.

Hi Damian, your understanding is right, except that I do use the alternative design mainly because of the disadvantages of coarrays of derived types with allocatable components, most of which you've already stated in your mail but see also my point (2) below. I do also agree with all your further statements in this e-mail. Let me add the following: (1) In my own programming, OpenCoarrays/GFortran has advantages over (e.g. performance) and complementary features (e.g. run-time error messages) to the Intel compiler, which makes it an invaluable tool for anyone doing serious Coarray Fortran/PGAS programming these days. I am using both compilers simultaneously with the same source code files and would recommend this as an ideal setting. In my own programming, OpenCoarrays/GFortran is definitely first class. (2) The use of coarrays of derived type with allocatable components did require more parallel logic code in my own programming. That is because the programmer cannot (directly) allocate remote memory but at the same time is responsible to avoid access to non-allocated remote memory in order to avoid severe run-time crashes. That is also why I am using coarrays of derived type with static components solely in my programming now. This may lead to derived type coarray array components which might be to small in size. Nevertheless, coarrays are intended for remote data transfer only and can be reused multiple times for this purpose. (Ok, to be honest, this also does require some parallel logic code, which I have not developed yet. Nevertheless, the use of symmetric coarrays has just to many advantages). (3) Coarray Teams (Fortran 2015) will allow coarray declarations to be limited within specific teams only (I remember this from a paper, I think from John Reid, which I do not have at hand right now). This will be another means to save coarray/PGAS memory in MPMD-like programming.

best regards Michael

2015-10-29 8:56 GMT+01:00 Damian Rouson damian@rouson.net:

Hi Stefano,

To be sure, there are applications in which the added flexibility of derived type coarrays with allocatable components is worth the cost, but it’s very important for users to at least understand that there is a cost. That understanding might inspire them to think of an alternative design. If I understand correctly, such was the case with the task-based parallelism approach developed by Michael Siehl (see https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/595365). He uses gfortran/OpenCoarrays to support a Multiple Program Multiple Data (MDMD) execution style that contrasts with the more typical Single Program Multiple Data (SPMD) style of most CAF codes. If I recall correctly, his problem is one that most naturally maps onto a design that employs derived type coarrays with allocatable components, but he came up with an alternative design to work around the lack of support for that feature and I think he might have benefited in process. I hope I’m characterizing his situation and solution correctly. I’m cc’ing him to confirm.

Although OpenCoarrays implements a substantial portion of the Fortran 2008 coarray features and even supports some Fortran 2015 coarray features, one missing Fortran 2008 piece is support for allocatable components in derived-type coarrays. That is one piece that I’m hoping will ultimately be developed under contract with a commercial compiler vendor.

One reason I’m comfortable leaving that feature unsupported for now is the aforementioned performance issue. Coarray Fortran (CAF) is based on a symmetric memory model, wherein each image allocates a coarray of a local shape that conforms with the shape of the coarrays on remote images. When the coarray is of intrinsic type, the compiler can compute the address of remote coarray elements on other images and get/put those elements without coordinating with the relevant remote image. This makes efficient use of CAF’s one-sided memory model, which in turn can make efficient use of hardware support for one-sided communication such as Remote Direct Memory Access (RDMA) hardware on Infiniband interconnects, which in turn can offer performance advantages relative to two-sided send/get semantics.

By contrast, when the coarray is of derived type, more inter-image coordination is required. With allocatable components of derived type coarrays, each image can allocate components of a different size and shape. In such a case, the image that gets/puts the data will have to query the remote image for the appropriate memory address before it can get/put data to/from the remote image. That extra coordination hurts performance. Therefore, derived type coarrays with allocatable components should be used sparingly.

Damian 510-600-2992 (mobile)

On Oct 27, 2015, at 8:17 AM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson

Today I found an hour to investigate the CAF-enabled Euler 1D test. I fail, maybe for either my inexperience or the few time devoted to.

I tried different approaches: embedding coarray into euler_1D type

type, extendes(integrand) :: euler_1D ... real(R_P), allocatable :: U(:,:)[:] ... endtype

wrapping local euler_1D type into a global euler_1D_caf type

type :: euler_1D_caf ... type(euler_1D) :: local[*] ... endtype_caf

non OO CAF object: use directly into the main program test

program euler_1D_caf ... type(euler_1D) :: domain[*] ... endprogram

In all the approaches exercised, the compiler fails. I am using GNU gfortran 6.0 trunk (compiled by means of your script) with mpich v3.1.4 as back-end and OpenCoarrays 1.1.0. The compiler sometimes fails with an obscure ...Please submit a full bug report, sometimes with a more clear reference to Error: Sorry, coindexed coarray at (1) with allocatable component is not yet supported.

I think that my relevant problems rely on the fact that euler_1D type has many allocatable components and from OpenCoarrays status https://github.com/sourceryinstitute/opencoarrays/blob/master/STATUS.md#compiler-issues- I notice that Derived-type coarrays with allocatable/pointer components are not yet handled properly related to the GNU gfortran support.

I will upload my failing codes soon, but I am now wondering if it is better to try Non-OCA CAF Compiler such as the Intel one in order to have quickly a CAF test. What do you think about?

— Reply to this email directly or view it on GitHub https://github.com/Fortran-FOSS-Programmers/FOODIE/issues/26#issuecomment-151537320 .

szaghi commented 8 years ago

@rouson thank you very much, very useful hints!

My opinions are:

(1) In my own programming, OpenCoarrays/GFortran has advantages over (e.g. performance) and complementary features (e.g. run-time error messages) to the Intel compiler, which makes it an invaluable tool for anyone doing serious Coarray Fortran/PGAS programming these days. I am using both compilers simultaneously with the same source code files and would recommend this as an ideal setting. In my own programming, OpenCoarrays/GFortran is definitely first class.

I totally agree! I routinely use several compilers with bullrt-proof debugging flags during tests. Unfortunately, I have not access now to neither IBM XLF nor Cray, but as Micheal I am using GNU gfortran (within OpenCoarrays) and Intel compilers. OpenCoarrays definitely rocks!

(2) The use of coarrays of derived type with allocatable components did require more parallel logic code in my own programming. That is because the programmer cannot (directly) allocate remote memory but at the same time is responsible to avoid access to non-allocated remote memory in order to avoid severe run-time crashes...

I am counscious of this. Nevertheless, my opinion is still that the concrete impossibility to use derived type CAF with allocatable components is frustrating. This is because:

when the programmer ensure to allocate symmetrically the CAF and to develop a SPMD program there is no particular parallel logic to be added with respect static derived type CAF (this was my first attempt);
when MPMD model is required derived type CAF with allocatable components could make the code more concise and clear, even if more parallel logic is necessary; the eventual performance lacks must be careful handled by the programmer (more than in SPMD model), but this is not really a problem.

I am very curious to study how Micheal achieve MPMD code: is there some free publications about this?

Thank you very much!

szaghi commented 8 years ago

@rouson @milancurcic and all,

Euler 1D CAF enabled test is here, it works as expected and... it seems to scale almost linearly on my workstation! Great OpenCoarrays!

I have to analyze better the scaling, the absolute computational times and I have to implement an identical CAF test without using FOODIE (as I have done for the OpenMP test), but the hardest part is done!

As anticipated 2nd and 3rd November I will travel along Italy, but early in the morning and later in the evening I should be able to connect also during these days. Let me know when you want to talk.

Have a good weekend.

rouson commented 8 years ago

Damian 510-600-2992 (mobile)

On Oct 30, 2015, at 9:05 AM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson @milancurcic https://github.com/milancurcic and all,

Euler 1D CAF enabled test is here https://github.com/Fortran-FOSS-Programmers/FOODIE/tree/master/src/tests/parallel/euler-1D-caf, it works as expected and... it seems to scale almost linearly on my workstation! Great OpenCoarrays!

I’m glad to hear it.

I have to analyze better the scaling, the absolute computational times and I have to implement an identical CAF test without using FOODIE (as I have done for the OpenMP test), but the hardest part is done!

And the best part is how quickly you accomplished it. As parallel computing has moved from the supercomputer center down to the multicore laptop and now that we finally have a parallel programming language that offers high performance with such a straightforward syntax and semantics, it is increasingly the case that “serial code is legacy code.” And because nature is inherently parallel to a large degree, I think we should all switch from asking ourselves “How do I parallelize this algorithm?” to asking ourselves “Why would I ever serialize this process?” I’m of the opinion that it’s time to start teaching scientific programming as parallel programming from the very start. I first got the idea from Peter Pacheco’s book An Introduction to Parallel Programming. His book aims to reach sophomore-level undergraduates and assumes only that they know C before taking the class.

As anticipated 2nd and 3rd November I will travel along Italy, but early in the morning and later in the evening I should be able to connect also during these days. Let me know when you want to talk.

My availability shows at http://rouson.youcanbook.me and I am always fine with people putting appointments on my calendar via that tool. I give up some of the freedom of controlling my calendar in exchange for eliminating the hassle of controlling my calendar.

Damian

szaghi commented 8 years ago

@rouson

As parallel computing has moved from the supercomputer center down to the multicore laptop and now that we finally have a parallel programming language that offers high performance with such a straightforward syntax and semantics, it is increasingly the case that “serial code is legacy code.” And because nature is inherently parallel to a large degree, I think we should all switch from asking ourselves “How do I parallelize this algorithm?” to asking ourselves “Why would I ever serialize this process?” I’m of the opinion that it’s time to start teaching scientific programming as parallel programming from the very start.

I absolutely agree!

My availability shows at http://rouson.youcanbook.me and I am always fine with people putting appointments on my calendar via that tool.

Wow! You come from Mars!

See you soon.

rouson commented 8 years ago

On Oct 30, 2015, at 9:25 AM, Stefano Zaghi notifications@github.com wrote: Wow! You come from Mars!

Believe it or not, I do often feel that way — especially on days when I drive an electric vehicle with a folding bike in the (front) trunk and a surfboard on top!

:D

milancurcic commented 8 years ago

@szaghi :+1: :+1: :+1: !!!

szaghi commented 8 years ago

@milancurcic I am monitoring Damian's free slots for talking with us. There are some possibikities the next week: November 10-11-12 from 12:00 to 13:30 AM Rome time. Are there some slits fine also for you? In Miami 12:00 AM Rome time should be very early (6:00 AM?). Alternatively, Damian has some slots in the same week at 5:30 PM Rome time that are fine for me and not so early for you. Let me know your preference (including also other weeks, not only the next).

See you soon.

milancurcic commented 8 years ago

@szaghi Hi Stefano, I'm a bit confused. I think the Nov 10-11-12, the slots open in Rome time are 12:00-1:30 AM. This is after midnight for you and late afternoon for me. It would work for me but probably not very good for you. I will let you decide. I won't travel or have big meetings in November so please just go ahead and pick a slot that works for you and let me know. Thank you!

szaghi commented 8 years ago

@milancurcic oppsss.... the format 12:00 AM is not 24:00.... the confused man is me!

Ok, I will recheck more carefully during this weekend. Thank you Milan!

rouson commented 8 years ago

Although it doesn’t show on my calendar, I’m ok with telephone calls in the late evening, say 9 PM to midnight U.S. Pacific time.

Damian

On Nov 6, 2015, at 9:19 AM, Stefano Zaghi notifications@github.com wrote:

@milancurcic https://github.com/milancurcic oppsss.... the format 12:00 AM is not 24:00.... the confused man is me!

Ok, I will recheck more carefully during this weekend. Thank you Milan!

— Reply to this email directly or view it on GitHub https://github.com/Fortran-FOSS-Programmers/FOODIE/issues/26#issuecomment-154477313.

szaghi commented 8 years ago

@milancurcic I am very confusing now... looking at this conversion table it is not clear if 12:00 AM correspond to the noon or to the midnight. If I understand right 12:00 AM Rome time means noon in Rome, right?

szaghi commented 8 years ago

@rouson @milancurcic

I am correct assuming that 12:00 PM => midnight 12:00 AM => noon ?

milancurcic commented 8 years ago

@szaghi No, it is the other way around :)

On Fri, Nov 6, 2015 at 12:26 PM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson @milancurcic https://github.com/milancurcic

I am correct assuming that 12:00 PM => midnight 12:00 AM => noon ?

— Reply to this email directly or view it on GitHub https://github.com/Fortran-FOSS-Programmers/FOODIE/issues/26#issuecomment-154478791 .

szaghi commented 8 years ago

@milancurcic :-) perfect! Thank you very much...

Now understand why a metric conversion misunderstanding destroyed lunar missions...

szaghi commented 8 years ago

@milancurcic

so this is wrong

http://www.calculatehours.com/Military_Time_Conversion_Table_Sheet.html

?

szaghi commented 8 years ago

No, ok I see my error!

12:00 PM => noon 12:00 AM => midnight

rouson commented 8 years ago

Yes

zbeekman commented 8 years ago

oh, man, I am trying to catch up! SO much ground was covered.

If I understand this thread correctly, and the ADT Calculus Pattern, and one of @milancurcic questions from our call (He asked something like: "why does FOODIE need to know about CAF/parallelism, since it is just a time integration library and it is the spatial operators that need to know about parallelism...") it is because the integrand type is extended by the concrete type which has the application specific implementation; therefore if the application has coarray components then the integrand type must have at least a dummy coarray component. Is this correct?

szaghi commented 8 years ago

@zbeekman yes your conclusion is correct. I modified the ADT API (by means of a preprocessor flag that by default is disable) just to allow concrete extensions to have components being coarray: the standard imposes this. Anyhow, the parallel benchmark I made does not use this feature: indeed I found that current support for components of derived type being coarray is somehow lacking/bugous in gfortran, thus I prefer to avoid coarray component and use coarray of intrinsic type as comunication buffer, see this.

zbeekman commented 8 years ago

I see. Yes, so far I have only been able to use CA components of intrinsic type.

@szaghi At some point... maybe for another project, and after I have had a chance to revisit it, I should loop you in on some of the work I had been doing with Damian about automatic data dependency tracking for differentiation operators with CAF. But this is beyond the scope of FOODIE and there are many things to do first.

szaghi commented 8 years ago

@zbeekman sounds very interesting, you guys do a very nice work!

rouson commented 8 years ago

@szaghi and @zbeekman, I suspect you are speaking past each other. If I understand correctly, @szaghi is referring to gfortran's lack of support for allocatable components of derived type coarrays:

type foo
  real, allocatable :: stuff
end type

type(foo) :: bar[*] ! gfortran can't handle this yet because of the allocatable component

whereas @zbeekman is referring to "components of derived type being coarray":

type bar
end type

type foobar
   type(bar) :: junk[*]
end type

Am I correct?

Damian

szaghi commented 8 years ago

@rouson Ohhh yes, I was referring just to allocatable components of derived type coarrrays (type(foo) :: bar[*]), but I have driven Zaak to the other issue talking about the modified API of our ADT for allowing eventual coarrays components into concrete extensions... sorry for my confused post!

zbeekman commented 8 years ago

@rouson thanks for the clarification. I was originally just talking about the current implementation and comments from our call, but now I can follow this thread, and understand the second issue: allocatable components of coarray derived type objects...

szaghi commented 8 years ago

This test now works well. I will open other issue for the (up-coming) new 2D tests.

Fortran-FOSS-Programmers / FOODIE

Coarray enabling #26

Why embedd Coarrays?

embedding coarray into euler_1D type

wrapping local euler_1D type into a global euler_1D_caf type

non OO CAF object: use directly into the main program test

remarks

= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES