Closed dsidoren closed 6 months ago
elaborating more on the above issue and rationale in a hope to gather more attention ;).
We are trying to use multio in our unstructured grid ocean model, FESOM, to use it as a backend for IO, in particular to write grib.
the API test examples like multi-replay
are focused on structured grid representation which we failed to easily adapt to out grid. So we needed a simple example that we can build upon for our unstructured model output. In simplest sense the global fields are 1d, so we made a simple prototype that will shed light on how to use multio in our model: chunk a global emulating parallel computation and write the output via multio as a single file. As we are also not experts of grib, it was logical to simplify to just write to raw binary.
here is snippet of our working prototype for 7 clients and 1 server that dumps a raw binary single or per server
cerr = cc%mpi_client_id("oce")
cerr = multio_initialise()
cerr = mio%new(cc)
cerr = mio%open_connections()
cerr = md%new()
cerr = md%set_string_value("name", "Fesom Grid")
!cerr = md%set_string_value("name", "any") ! also works
cerr = md%set_string_value("category", "ocean-domain-map")
cerr = md%set_string_value("representation", "unstructured")
cerr = md%set_int_value("globalSize", 700 )
cerr = md%set_bool_value("toAllServers", .TRUE._1)
buffer=[(myrank*100+i-1, i=1,100)]
cerr = mio%write_domain(md, buffer, size(buffer))
!Writing fields
cerr = md%new()
cerr = md%set_string_value("category", "ocean-2d")
cerr = md%set_int_value("globalSize", 700)
cerr = md%set_int_value("level", 0)
cerr = md%set_bool_value("toAllServers", .FALSE._1)
cerr = md%set_string_value("name", "sst")
cerr = md%set_string_value("param", "sst")
cerr = md%set_int_value("step", 1)
cerr = md%set_string_value("gridSubType", "Fesom Grid") ! also works without
cerr = md%set_string_value("domain", "Fesom Grid") ! also works without
cerr = md%set_string_value("typeOfLevel", "oceanSurface") ! also works without
cerr = md%set_int_value("globalSize", 700 )
values = [(myrank*100+i-1, i=1,100)]
cerr = mio%write_field(md, values, size(values))
and corresponding server config
client:
on-error: abort-all-transports
plans:
- name : ocean-replay-grid-info-stream
actions :
- type : select
match :
- category : [ocean-domain-map, ocean-mask]
#- type: print #used to debug on screen
- type : transport
target : server
- name : ocean-replay-test-stream1
actions :
- type : select
match :
- name: [ sst, ssv ]
- type : transport
target : server
server:
transport : mpi
#group: world
plans :
- name : ocean-fields
actions :
- type : select
match :
- category : [ocean-2d]
- type : encode
format : raw
- type : sink
sinks :
- type : file
#append : true
#per-server : true
path : ocean-output-field.bin # read in python as np.fromfile("..ocean-outpur-field.bin", np.double)
the expected output for 7 clients is 0-699 in that order but the output data is in random order on each run, as if server dumps data to a file as it receives from different processes, in general i would imagine there is a need to keep an association between rank and local indices of data mapped to global indices and i can't see it how this happens with metadata and data we sent in the example. what are we missing?
I can imagine many issues that may have gone wrong in example above in relation to working structed data example, not using communicators as intended, gathering works only for grib and not raw binary because write domain takes more data in structured case, and say in case of structured data lat-lon coordinates are used internally to sort the gathered data....
while we explore more, @dsarmany @simondsmart, it would be very helpful if you could shed some light on it. At least on how does multio server knows how to map local data to global indices are we missing any metadata for domain or fields.
Yes, you are right. With that configuration, the 'server dumps data to a file as it receives from different processes'.
If what you'd like to do is to create a global, aggregated field from the partial fields (sent from each client to a server), you will need to add an 'aggregate' action to your configuration file. That will use the local-to-global mapping information you communicate with cerr = mio%write_domain(md, buffer, size(buffer))
and create a global field where the values a placed according to that mapping.
If the test_multio_hammer_mpi
test passes, it suggests that the aggregation action with unstructured mapping representation behaves as expected. Here is the server-side configuration for that test: https://github.com/ecmwf/multio/blob/develop/tests/server/config/test-configurations.yaml#L5 .
I appreciate that currently there is no fortran test that replicates this behaviour. It is something we may add.
Looking at your fortran code, I think it should work once you add the 'aggregate' action. The only comment is the you probably want to call md%delete()
after both cerr = mio%write_domain(md, buffer, size(buffer))
and cerr = mio%write_field(md, values, size(values))
.
@dsarmany thanks a lot! that was helpful.
After using aggregate action here comes the issue that i am feeling good about :D, my hunch that some domain mapping info is missing in my example
Exception: Cannot find domainMaps for Fesom grid
Exception: FailureAware<Action> with behaviour "propagate" on Server for context: [
Aggregation(for 1 fields = [
---> {"category":"ocean-2d","globalSize":700,"level":0,"toAllServers":false,"name":"sst","param":"sst","step":1,"domain":"Fesom grid","typeOfLevel":"oceanSurface"}]) with Message: Message(version=1, tag=Field, source=Peer(group=multio,id=3), destination=Peer(group=multio,id=7), metadata={"category":"ocean-2d","globalSize":700,"level":0,"toAllServers":false,"name":"sst","param":"sst","step":1,"domain":"Fesom grid","typeOfLevel":"oceanSurface"}, payload-size=800)
]
So how can we create a domain map for an unstructured grid, In case of relay example using nemo data it seems to me that domain mapping is automatically done by writing fields: lat_T
, lon_T
(for say domain T Grid
) or is it in data of mio%write_domain? , can you please confirm that. Now wondering in case of unstructured what fields are needed for domain, indices seem most obvious thing but then what's the name to use for that.
on
I appreciate that currently there is no fortran test that replicates this behaviour. It is something we may
Just a comment, thanks, any test even using cpp/c using multio-api is appreciated because it is the api that we intend to use in the end. In case of example like hammer, it uses multio library directly and i find it hard to map all the direct library calls to that of api calls. For instance in https://github.com/ecmwf/multio/blob/45ff382cc798e6b185806485dbf60dc9b189ea01/src/multio/tools/multio-hammer.cc#L385 there is an index map for unstructured grid and i am not sure where that is used or how it is communicated to the server.
The exception tells you that the domain named 'Fesom Grid' has not been communicated to the server before a field associated with that domain has arrived for aggregation. Looking at the code you posted before, it is not immediately clear why that is happening. Did you change that to call delete
on the metadata once it is passed to multio?
Apart from that, your code snippet seems correct at a glance. You need to call write-domain
before you call write_field
and make sure that the 'name' key of the domain has the same value as the 'domain' key for the field.
There is no need to worry about lon_T and lat_T. They are not related to the domain information. They are the coordinates output as fields and we need them in NEMOv4 for an unrelated reason. I don't think you will need them.
So I think your best bet at this stage is to create a small example (with much fewer than 700 points) that reproduces this behaviour. Then you can try running it with MULTIO_DEBUG=1. That will give you a lot more output but hopefully not so much to swamp you. Then we may be able to understand why the domain is not recognised.
Thank you, it works now with "- type : aggregation" and other modifications in yaml. The things we also tried are setting "step" and "level" in order to mimic the temporal 3D output. It works for "step" but the levels are now reordered in the written binary file.
do j=1, 10
.....
cerr = md%set_int_value("level", j)
cerr = mio%write_field(md, values, size(values)) ! we write each level individually
.....
end do
As a result the first written level is level 4 then 5, 1, 7 etc. Is there an any easy way to dump them in the increasing/descending order?
Thank you!
The short answer is that no, unfortunately not, or at least not easily.
By default, MultIO treats every step and every level as a conceptually different field, so there is no attempt to preserve any order among steps and levels. That is mainly because the output is usually in GRIB format and the order of GRIB messages in a given a file makes no difference.
That said, MultIO also supports the computation of temporal statistics, in which case different steps of the same physical quantity need to arrive in order. That can be achieved and it is also no strict requirement as there is usually a natural lag between the computation of subsequent steps.
Currently there is no such mechanism for levels, but it is certainly something we can consider adding. What about raising a separate ticket for that as it does not seem to be related to local-to-global mappings anymore?
Currently there is no such mechanism for levels, but it is certainly something we can consider adding. What about raising a separate ticket for that as it does not seem to be related to local-to-global mappings anymore?
Actually, could you try modifying the transport
action in your configuration file like this:
- name : ocean-replay-test-stream1
actions :
- type : select
match :
- name: [ sst, ssv ]
- type : transport
target : server
hash-keys: [category, name]
This may just do the trick. Fields with different levels should arrive in the order the model mio%write_field
s them.
Thank you, that solved the problem. We finally managed to integrate multio into FESOM and it seems to work. We however noticed that it does not accept overlapping partitionings. As an example, running on N processes with the global size of the domain S multio requires that the sum of all subdomain sizes over N equals S. If this condition is not met the code runs without any error message but the resulting file is just empty. Changing the logistics inside FESOM (passing only "shrinked" arrays to multio such that "sum(s_i)=S") solved the problem. The question is if you plan to support overlapping subdomains in the future?
If you are using 'unstructured' domain indexing, then yes, MultIO currently assumes that there are no duplicate indices into the global domain (no overlapping partitioning). With overlapping subdomains, there will be duplicate values for some global indices, in which case MultIO will either have to know which is correct or have to assume it does not matter because they are all equal.
For 'structured' domain indexing, MultIO does in fact support overlapping subdomains, as long as it follows the convention described in Sections 5.2.2 and 5.2.3 from here: https://forge.ipsl.jussieu.fr/ioserver#:~:text=Documentation-,XIOS%20user%20guide,-XIOS%20reference%20guide .
Can this be closed? Seems like FESOM is running now. Don't know if the last comment about non overlapping subdomains for unstructured grids is still relevant.
Could be a feature request, if that is required in the long run.
https://github.com/ecmwf/multio/blob/d3d8781861debb35f8df19a8970475f7cc8ca61e/src/multio/api/multio_api.f90#L44
we did a Fortran example for writing the a field of unstructured representation (just a vector) in a raw format. For this we use N cores and define local-to-global index mapping as:
Then we write a test vector field into a binary file:
It turns out that the global indexing order is not taken into account when the raw file is dumped. Namely the processors which first call the "write_field" will write the first chunks of data. It obviously changes the order of the raw file content from run to run.
We we expecting that local-to-global reordering will be made before dumping the data. Is there any option to force mio to do so?
Thank You!