Closed emanuelegiona closed 1 year ago
Hi @emanuelegiona, WOSS 1.12.3 fixes some issues with coordinates conversions, as reported in this changelog.
Please try to update both WOSS and woss-ns3 to the latest 1.12.5.
After that, if you are still facing the issue:
1) provide a simple .cpp example so that we can try to reproduce.
2) provide the output of the simulator (standard output to a file) with every single debug option in the helper active.
This should help in pinpointing the exact call before the NetCDF error.
3) finally, debug the ns3 example via GDB (check ns3 wiki on how to do this) and when the program will stop due to error, run the backtrace
option and report the output here.
Thanks
regards
Thanks @MetalKnight for the quick reply. I updated to WOSS 1.12.5, as well as its ns-3 integration module, however the problem has not been solved. I am still working on the 3.33 simulator version and using the 2020 GEBCO global grid.
If anything, the outcome has worsened; indeed, when executing simulations in the completely static scenario (characterized by fixed roaming node position and no "high-mobility" WOSS configuration), the "NetCDF: HDF error" is thrown, whereas it was working with WOSS 1.12.0 before.
Please find a simulation script acting as Minimum Reproducible Example at this link, as well as the log files you requested at steps 2 and 3:
WOSS debug options ON (scenario: no high mobility, fixed roaming node position at waypoints n. 0 and n. 10): link (279 MB zip file, ~7.5 GB per log file, 2 log files)
WOSS debug options ON (scenario: high mobility, roaming node): link (146 MB zip file, ~7.4 GB per log file, 1 log file)
GDB backtrace for WOSS configuration "no high-mobility" and fixed roaming node to waypoint n. 0: Attached file: gdb_bracktrace_fixed.log
GDB backtrace for WOSS configuration "high-mobility" and roaming node: Attached file: gdb_bracktrace_roaming.log
thanks @emanuelegiona we will check and get back.
In the meanwhile could you please:
4) provide system specs, (distro, kernel, gcc version)
5) check if the sediment V1 dbs have the same issue. Be aware that you will also need to change SedimentDbDeck41DbType
to 0
Thanks for looking into it, your help is much appreciated.
Below you can find the additional information you asked for:
Distro: Ubuntu 18.04 LTS (also due to acoustic toolbox gfortran requirements) Kernel: 5.4.0-150-generic GCC: 7.5.0
Simulation outcomes when using sediment V1: a. WOSS configuration "no high-mobility" + roaming node fixed at waypoint 0 position: working b. WOSS configuration "no high-mobility" + roaming node: NetCDF: HDF error (ncVar.cpp:1650) c. WOSS configuration "high-mobility" + roaming node: NetCDF: HDF error (ncVar.cpp:1650)
Please find attached GDB backtrace files for scenarios in which the error occurs:
P.S. For the sake of complete transparency, the following code changes have been applied to the MRE: a new CLI option --wossSedimVersion
has been added, reflecting a similarly named Experiment
member field which is used in the following way:
if(m_wossSedimVersion == 1)
{
m_wossHelper->SetAttribute("SedimDbCoordFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_coordinates.nc"));
m_wossHelper->SetAttribute("SedimDbMarsdenFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_marsden_square.nc"));
m_wossHelper->SetAttribute("SedimDbMarsdenOneFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_marsden_one_degree.nc"));
m_wossHelper->SetAttribute("SedimentDbDeck41DbType", IntegerValue (0)); // DECK41 V1 database data format
}
else if(m_wossSedimVersion == 2)
{
m_wossHelper->SetAttribute("SedimDbCoordFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_V2_coordinates.nc"));
m_wossHelper->SetAttribute("SedimDbMarsdenFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_V2_marsden_square.nc"));
m_wossHelper->SetAttribute("SedimDbMarsdenOneFilePath", StringValue (m_wossDbsPath + "/seafloor_sediment/DECK41_V2_marsden_one_degree.nc"));
m_wossHelper->SetAttribute("SedimentDbDeck41DbType", IntegerValue (1)); // DECK41 V2 database data format
}
else
{
NS_FATAL_ERROR("Experiment::InitialSetup: invalid WOSS sediment version [1 or 2]");
return;
}
I see you encountered this very same error before and reported on it as well; apologies for not adding to that open issue myself.
However, is there any temporary known workaround you are using and, in case there is none, would introducing a throttling mechanism be helpful, in your opinion? This so-called throttling mechanism would either be implemented by:
keeping a rough counter to the NetCDF calls, reset once a sleep()
function is executed upon reaching a certain limit; or
wrapping the failing NetCDF call to handle the exception, thus invoking sleep()
and afterwards retrying the execution of the same NetCDF call.
In both solutions, the sleep()
duration should be chosen to introduce as little overhead as possible, otherwise simulations that are long-running or large (in terms of nodes) are going to be affected too much.
Additionally, solution 1 depends on an arbitrary limit of NetCDF calls, that you estimated in the range of 500k: this might be system-dependent. Moreover, the configuration of such limit should account for eventually turning off such throttling, in order to avoid future code changes once the NetCDF team solves this error.
Solution 2 instead does not require the extra effort needed for the previous one, also benefiting the entire WOSS codebase for a more robust interaction with NetCDF.
Once a solution on the NetCDF part is implemented, the exception handling code would become obsolete for this edge case, but still be useful in the wake of other cases of exceptions.
In the latter implementation, the sleep()
duration might be computed similarly to backoffs in communication protocols, gradually increasing at each attempt and until a fixed number of maximum attempts. Once the maximum attempts are reached, execution might crash or provide a safe default value (i.e. simple 3-ray model with flat seabed surface).
@emanuelegiona the problem is that we don't know: 1) if this is a NetCDF4 issue 2) if this is HDF5 library issue
we don't know the reason why the library is throwing an HDF error. So we really don't know if the simulation can continue after that error, meaning that even we catch the error it is still possible that every subsequent getVar() call will fail. I don't know if a sleep() could help. If this is a logical issue, I don't see how this could do it.
First option here is to try to build the latest of HDF5 and NetCDF4, meaning 1) downloading the latest HDF5 (1.14.2) and build it the same instructions 2) download the latest NetCDF4-C (4.9.2) and build it with the same instructions 3) rebuild and relink the NetCDF4-C++ against the two newly installed libraries 4) rebuild WOSS (preferably 1.12.5) against the latest netCDF4 library 5) rebuild woss-ns3 (preferably 1.12.5) against the latest NetCDF4 library
and finally check if the issue is still present.
By the way, I encourage you to move to WOSS 1.12.5 which handles coordinates conversion properly.
Hi @emanuelegiona on my ubuntu 22.04 machine with gcc 11.4.0 and the recommended libraries (WOSS, woss-ns3, NetCDF4, HDF5, NetCF4C++ etc...) and using your example (after having it tweaked with Uan standard PHY and with no cmdline args) issue was reproduced. We will check what happens with the latest HDF5 and NetCDF4-C
@emanuelegiona I can't seem to reproduce the issue with:
How to install.
relaunch the test.
Let me know your results. thanks
Upgrading such libraries appears to be fixing the crashes.
My tests did not end up crashing in both cases of roaming node and no-high-mobility configuration (identical setup as your execution with no further CLI arguments) as well as roaming node and high-mobility configuration.
Thanks for looking into it.
P.S. On an unrelated note: is there any way to turn off BellhopWoss::checkDepthOffsets()
warnings from the ns-3 interface? Even when turning all WOSS debug options OFF, they are still shown. Sorry if this is not the appropriate place to discuss it.
Thanks for confirming this, I will close the issue. woss website has been already updated with the new recommended libraries and installation instructions. That warning is always printed since it tells you that something is not not properly configured with the DepthOffset in your test scenario.
I'll see what I can do in the next WOSS release. cheers
Dear colleagues,
I am working on mobile network simulation scenarios leveraging WOSS through this integration module.
Simulation scenario
Using the example as base, I modeled my scenario equipping all nodes with
WossWaypointMobilityModel
, with a set of 4 nodes having fixed location and a single node roaming through the network.Fixed nodes location example (CSV)
Roaming node waypoints file example (CSV):
Geodesic coordinates from such files are fed to the mobility model via the
CreateVectorFromCoordZ()
after having createdwoss::CoordZ
objects from related fields. Depth is passed both to theCoordZ
constructor and via theCoordZ::setDepth()
function after each object creation.WOSS configuration
According to advices regarding "high mobility" scenarios (1)(2), we can devise the following WOSS configurations:
ResDb + no memory optimization
WossHelper
attributesResDbFilePath
andResDbFileName
are properly defined, and theWossPropModel
instance used forWossChannel
is created with default attribute values.No ResDb + memory optimization Following previously mentioned advices,
ResDbFilePath
andResDbFileName
attributes ofWossHelper
are left with default values, whereasWossPropModel
'sMemoryOptimization
attribute is set totrue
.Outcome: error not appearing
Upon executing my simulation, I noticed the "NetCDF: HDF error" does not appear only if the network is completely static and configured accordingly: i.e. WOSS configuration 1, with the roaming node having just 2 waypoints, one at time 0 and the other at time N, both with having the same position (e.g. waypoint 0), effectively making it a static node as well.
Outcome: error consistently appearing
The error instead consistently shows up whenever the following simulation setups are executed:
Activating all WOSS-related debug options via the
WossHelper
interface does not shed more light on this error, which is only accompanied by the exception source "ncVar.cpp line:1626".Simulations during which the error occurs actually run for some time first, and then crash upon this error appearing. In order to rule out possible invalid locations of the roaming node, the "effectively static node" setup has been tested across multiple different locations for both WOSS configuration 1 and 2. WOSS configuration 2 was thus identified as problematic, whereas WOSS configuration 1 did not pose issues in the "effectively static node" case.
System setup
All libraries are installed as per instructions, passing all tests.