Again fields in diagnostics.F90

DanRRRR commented 1 year ago

I just posted this at the end of closed Issue#481 and not sure it is visible, so i re-post it here

I have even more strange related to the fields problem exactly in this subroutine diagnostics.F90 too. It does not require to look deeper than on the surface and just two 1 minute runs.

Let's just add the following PRINT* line to this subroutine before or after these lines (line around 393), it does not matter,

   CALL sdf_open(sdf_handle, full_filename, comm, c_sdf_write)>

The line we will add is telling us what indices of 3D e-field array each core is dealing with when they go to be saved:

'print*, 'cell_x_min(rank+1),cell_x_max(rank+1), cell_y_min(rank+1),cell_y_max(rank+1), cell_z_min(rank+1),cell_z_max(rank+1)=', & cell_x_min(rank+1),cell_x_max(rank+1), cell_y_min(rank+1),cell_y_max(rank+1), cell_z_min(rank+1),cell_z_max(rank+1)

and let's run the standard 3D demo deck called cone.deck from epoch3d samples. With 1 core we get of course the entire array and initial and final indices for x,y and z dimensions look like this: 1 250 1 250 1 250 because nx, ny and nz all are =250 in initial settings.

Now let's run with 2 cores. Natural to expect that the entire domain will be divided by two equal parts. What we get for first core cell_x_min(rank+1),cell_x_max(rank+1), cell_y_min(rank+1),cell_y_max(rank+1), cell_z_min(rank+1),cell_z_max(rank+1)= 1 250 1 250 1 125 is exactly what we expected, the core 0 took exactly half of the cube. But what we get for another core looks shocking:

cell_x_min(rank+1),cell_x_max(rank+1), cell_y_min(rank+1),cell_y_max(rank+1), cell_z_min(rank+1),cell_z_max(rank+1)= 22050 5 32739 32739 126 250

What is that??? Z array is right by the way, it took exactly another half of the array from 126 to 250.

Status-Mirror commented 1 year ago

Hey Dan,

Let's consider the cell_x_max array. By navigating to epoch3d/src and running the Linux terminal command: grep -rnw '.' -e 'cell_x_max', we are told all the files this array appears in.

We can see that it is initially allocated in mpi_routines.F90, and it is allocated to a size of nprocx. Here, nprocx is the number of processors in the x direction, which in your example is 1 (split is in $z$, not $x$). Hence, when you call cell_x_max(rank+1), you are calling a number outside the bounds of the array, which explains the nonsense result.

To further verify this, if you ran print*, cell_x_max, you would only print a single number.

By sheer coincidence, we also have nprocz=2. Because there are no splits in $x$ and $y$, your rank output will actually work here - although this is misleading, as it will only "work" when your domain is split in 1 dimension.

Hope this helps, Stuart

DanRRRR commented 1 year ago

Wow! If these cell_x_min, cell_x_max, cell_y_min, cell_y_max, cell_z_min and cell_z_max not defined for all cores even if there was no division in appropriate dimension that makes the logic to find indices of decomposed for each core grid pretty complex.

If i want to do the independent of SDF text/binary output of the e-field, how to find them for the each sub-array (what each core is dumping in this subroutine) ? Do some other array report these indices of global array which were used by each specific core?

In principle, if the entire array of field exists in memory i can take this entire array instead of portions like it was done in this subroutine (if it is possible to interrupt the code for the time of output)

Status-Mirror commented 1 year ago

There may be some misunderstanding - all ranks can see the locations of all splits. These arrays don't refer to the cell boundaries of the current cell.

The value of cell_x_max on ranks 0 and 1 are both (250).
The value of cell_z_max on ranks 0 and 1 are both (125, 250).

The entire field does not exist in memory on any one MPI-rank. This is why each processor writes a section of the global array to a pre-determined point of memory in the output file, using the SDF library. For the fields, each rank prints: ex(1:nx,1:ny,1:nz), where nx, ny and nz are related to the local size of the rank, and are different on each rank.

DanRRRR commented 1 year ago

One of developers some time ago wrote me this: "the decomposition of the grid is stored in the variables cell_x_min, cell_x_max, cell_y_min, cell_y_max, cell_z_min and cell_z_max. Each of these are 1D integer arrays that contain the maximum and minimum values in the hypothetical global array for the part of the fields that each processor holds. So if you want to find the minimum index in X for the current rank then that would be "cell_x_min(rank)"." etc

Was he wrong besides his typo that it has to be cell_x_min(rank+1) or i still miss something?

Status-Mirror commented 1 year ago

I agree with most of what the the last developer said, but I think a mistake was made in how to access the information from cell_x_min. I believe both cell_x_min(rank) and cell_x_min(rank+1) are wrong.

In 3D, domains are split along 3 directions: $(x, y, z)$. In your example, the domain decomposition is $(1\times 1 \times 2)$. It's not the rank that's important, it's the position of the domain on the decomposition grid. This isn't found using rank, but instead using x_coords, y_coords and z_coords.

I think you should be using cell_x_min(x_coords+1), cell_y_min(y_coords+1), cell_z_min(z_coords+1).

It's worth noting I didn't develop the core MPI and output routines, so I only have access to the same information as you. But, from what I can see, I think this should work for you.

Cheers, Stuart

DanRRRR commented 1 year ago

Thanks for the hint to explore, Stuart!

DanRRRR commented 1 year ago

Well, I'm still getting error

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 380350 RUNNING AT System-Product-Name
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)`

I am trying to output into independent files the array slices of Ex(ix,iy,iz), Ey(ix,iy,iz), Ez(ix,iy,iz) from each core after successfully saving particle data for each processor core (rank) in this subroutine "output_routines". So there should be two files if we run on two cores. Suspect that doing that is still illegal since these arrays are global and hence not independent, specific for each core. Hence at some point always takes place conflict of accessing the global array by several competing cores doing the WRITE into SDF in parallel in this subroutine...

The way it should work i guess should be opposite: it is necessary to use the local arrays each processor deals with and somehow transform their matrix indices into global matrix indices ( if locally they are always start with 1 like you have mentioned above: ex(1:nx,1:ny,1:nz) )

But may be local for each core field arrays ( then they have to have different names than ex,ey,ez, What their names are ? ) still have global index numbering and do not start with 1 ?

/* This is how involvement of proprietary SDF complicated the code that the only way to get fields is to read SDF file in Matlab or Python and get the data out of them with the 1% speed. This output of epoch data in binary format is killing me, I already lost couple months on this...

Status-Mirror commented 1 year ago

Usually we don't offer support on EPOCH extensions, we only cover issues with the most recent version. However, what you're proposing is quite a small task, so I have written the following solution for you.

Consider the 2D cone.deck file - this spans $250\times 250$ cells. When running on 4 cores, each core only creates arrays of size $125\times 125$. Say we wish to output the fields from ix = 100 to 200, and iy = 1 to 250. This is tricky, as the data is stored on different ranks, and a rank may not have enough memory to load a global copy of the array in the general case. We need to print the output from each rank separately.

In the write_field subroutine of io/diagnostics.F90, there are lines which read:

    IF (restart_id .OR. (.NOT.dump_skipped .AND. unaveraged_id)) THEN
      CALL sdf_write_plain_variable(sdf_handle, TRIM(block_id), &
          TRIM(name), TRIM(units), dims, stagger, 'grid', array, &
          subtype, subarray, convert)
      dump_field_grid = .TRUE.
    END IF

These lines call the commands which write a generic field (array) to the SDF file, and are only activated on steps which dump the field (set by the output block). This seems like a sensible place to add a new output for a reduced field. After dump_field_grid = .TRUE., you can add the following code:

      ! These are the global indices of the field slice to print
      print_ix_min_global = 100
      print_ix_max_global = 200
      print_iy_min_global = 1
      print_iy_max_global = 250

      ! Identify if the current rank contains grid-points to print
      IF (cell_x_min(x_coords+1) < print_ix_max_global .AND. &
          cell_x_max(x_coords+1) > print_ix_min_global .AND. &
          cell_y_min(y_coords+1) < print_iy_max_global .AND. &
          cell_y_max(y_coords+1) > print_iy_min_global) THEN

        ! Identify the indices to print on the local rank
        ! Use MAX and MIN to prevent writing array elements beyond the rank
        print_ix_min_local = &
            MAX(print_ix_min_global - cell_x_min(x_coords+1) + 1, 1)
        print_ix_max_local = &
            MIN(print_ix_max_global - cell_x_min(x_coords+1) + 1, nx)
        print_iy_min_local = &
            MAX(print_iy_min_global - cell_y_min(y_coords+1) + 1, 1)
        print_iy_max_local = &
            MIN(print_iy_max_global - cell_y_min(y_coords+1) + 1, ny)

        ! Open a unique file for each rank
        WRITE(rank_string, '(I3)') rank
        WRITE(step_string, '(I6)') step
        file_unit = 1142+rank
        OPEN(unit = file_unit, file = 'dump_' // TRIM(block_id) // '_rank_' &
            // TRIM(rank_string) // '_step_' // TRIM(step_string) // '.dat', &
            status = 'new') 

        ! Write the global array indices present in this file
        WRITE(file_unit, *) "Global ix range:", &
            print_ix_min_local + cell_x_min(x_coords+1) - 1, " to ", &
            print_ix_max_local + cell_x_min(x_coords+1) - 1
        WRITE(file_unit, *) "Global iy range:", &
            print_iy_min_local + cell_y_min(y_coords+1) - 1, " to ", &
            print_iy_max_local + cell_y_min(y_coords+1) - 1 

        ! Write the data one line at a time
        DO iy = print_iy_min_local, print_iy_max_local 
          WRITE(file_unit, *) &
              array(print_ix_min_local:print_ix_max_local, iy)
        END DO

        ! Close the file
        CLOSE(unit = file_unit)
      END IF

Note, you will have to add:

    INTEGER :: print_ix_min_global, print_ix_max_global
    INTEGER :: print_iy_min_global, print_iy_max_global
    INTEGER :: print_ix_min_local, print_ix_max_local
    INTEGER :: print_iy_min_local, print_iy_max_local
    INTEGER :: iy, file_unit 
    CHARACTER(LEN=string_length) :: rank_string, step_string

to the start of the write_field subroutine.

The result is that whenever the code writes a field to SDF, it also creates a file from each rank (ignoring ranks containing no data in the slice). The global indices of the array are listed at the start of the file. In this example, the file titled "dump_eyrank 0step 0.dat" starts with the lines:

Global ix range:         100  to          125
Global iy range:           1  to          125

as expected for rank 0.

I will be unable to provide further support on your proposed extension - hopefully you can modify this example to suit your needs.

Finally, I'll say that MATLAB and Python aren't the only ways to extract data from SDF, and the file format isn't a trade-secret. We provide documentation on how these files are structured, so you can create your own readers in any language. Binary files have the advantage of storing data in much smaller files than in ASCII, which is important for the massive data-files which can be generated in EPOCH simulations. I think your development time would be better spent learning to use SDF, rather than spending months trying to work around it.

Hope this helps, Stuart

DanRRRR commented 1 year ago

I was looking at the email notifications and in this GitHub few times per day and when did that today found that your respond was posted 4 days ago. I think i need to employ ChatGPT as a secretary to control numerous things around me, as i am totally out of any schedules lately. :)))

Thanks for the hint and the demo code to try. Hope the way i did also will work after substituting Ex, Ey and Ez i used to array as the array is the local entity to storing field for each core while Ex, Ey and Ez are global ones and this is why my approach did not work.

As to SDF as you remember i tried to deal with it really hard. I tried
1) to make anyone who developed SDF to cooperate with Visit and Paraview speifically for Windows to adopt it for their visualizers. Visit and Paraview expressed their willingness, these two are like Microsoft and Apple in graphics, but no one responded from the SDF side. Which means the SDF is totally dead 2) to convince anyone to develop standalone converter and structure visualizer of SDF same way as there exist converters and visualizers for HDF files. With such tools anyone would extract any data from black box which could in long run be not supported anymore. We have to keep published data for 10 years, this is what top journals require. Can we keep data in dead format which seems is dead long ago?

3) Also, as time progressed the software community finally essentially solved the problem of compatibility of binary formats and hence there is no need in any formats anymore. 200 different formats it supports caused Visit to compile for the entire day on a few dozen of cores and at the end to fail which happens on not that new anymore three years old OS like Ubuntu 22. 4) Fortran STREAM binary format is way faster and more compact than any proprietary formats and is damage resistant, it is compatible to any other hardware or compiler now so why bother with formats which have no future ? 5) Also, EPOCH has the ability to read users external initial binary or unformatted file data but why it has no ability to dump its generated data in binary form? Users will be able to use this data with absolutely any compilers or other software they are familiar with. Instead we have only IDL, Matlab and Python as choices. Modern Fortran also as time progressed mostly removed any need in IDL, Matlab or Python, C/C++ and all other languages for essential needs of scientists and engineers. Just look at Silverfrost Fortran and other Fortran compilers, they are all pretty compatible and interoperable with each other and other languages, allow you create GUI, graphics, interaction with OS etc

DanRRRR commented 1 year ago

I finally succeeded on accessing field data. Thanks again, without this help it would take forever.

And please do something with SDF.

SDF quietly killing EPOCH reducing its adoption several times. Please add direct output of all major data, remember that the major user of PIC codes is not a computer science department programmer. Or at least keeping SDF add HDF5 at least, also dying format, which is much more widely adopted and has many tools to extract the data

Also when some issues are marked as CLOSED no one sees the solution which is potentially useful for everyone. Would be better to add the new topic something like "Useful Solutions" and place them there

Warwick-Plasma / epoch

Again fields in diagnostics.F90 #482