Closed psakievich closed 5 months ago
Thanks @psakievich. Note that the current fix generates this error message:
>> 593 /projects/wind_uq/lcheung/NaluBuilds/exawind-manager.attaway.20240220/exawind-manager/environments/naluwindNateVerOF34patch/nalu-wind/src/Realm.C:3416:9: error: 'fld' was not declared in this scope
so I put it inside the loop:
for (unsigned i = 0; i < numStates; ++i) {
auto* fld = field->field_state(static_cast<stk::mesh::FieldState>(i));
fld->clear_sync_state();
fld->modify_on_host();
ngp_field_manager().get_field<double>(fld->mesh_meta_data_ordinal());
fld->sync_to_device();
}
We're testing this out now, let's see if fixes the problems.
Lawrence
It doesn't make sense to do both clear_sync_state and also modify/sync. Where is the field being modified? You should only do modify if you are modifying the field data. Alternatively you do clear_sync_state if you intend to overwrite the field data and you don't care which memory space has the most recent previous modification.
I'm second guessing this now too @alanw0. The modification is on L3419, but the states are all screwed up.
I'm second guessing this now too @alanw0. The modification is on L3419, but the states are all screwed up.
What do you mean by 'screwed up'? Do you think that the I/O read is failing to mark some fields or field-states as modify-on-host? We have modify_on_host calls in the I/O, but let us know if you think something is missing. Perhaps the I/O is only reading state new, and marking that field as modified-on-host, but then state old is left un-marked? Or vice-versa? That should be ok, since it would get fixed the next time you do a state rotation, or the next time you need the field (of either state) on device, which is where sync_to_device would get called.
Keep us posted if you think stk is missing some calls or has things wrong.
@alanw0 #1190 shouldn't be happening is what I mean. I'm also not really sure why we need this per component code here either though. Seems like it is likely redundant. I also don't think we need a sync to device here either. That should happen at the calling points for the code using the fields on device. Seems like this could be excessive host-device communication.
Seems like we should be able to just delete all of this: https://github.com/Exawind/nalu-wind/blob/56c0e6dbca1255eba0e2f077bc59a1b4e286669c/src/Realm.C#L3421-L3435
Provided the ioBroker marks fields as modified on host when it reads them in.
My apologies, I should have reported this earlier -- adding the fld->clear_sync_state();
inside the loop did fix the modify on host/device problems that Ken and Myra were seeing in their runs.
Lawrence
Sync issues are arising for fields that have been messed with prior to the
populate_restart
call. For restart, we are just reading from file so sync state should be clear when reading.Closes #1191