Closed marchdf closed 1 month ago
I think this might affect a lot of users if they're not aware of this change. I do like the native
format because it should be a pretty good speed-up over netcdf
. But one downside is that we've got a lot of utilities set up for netcdf
sampling postprocessing (like this one here). To take full advantage of this we should create a similar set of python utilities for the native format, and not necessarily rely on Paraview for loading and visualization.
Also tagging @rybchuk.
Lawrence
Yeah... I hate breaking workflows. I hate defaults that make the code slow. So I feel stuck.
I would advocate for keeping the default sampling data format to netCDF. If a user is saving out data, presumably they want to analyze that data. There is lots of support for analyzing netCDF data. I went on a side quest a year or two ago where I had to mess with AMReX native checkpoint files (https://github.com/rybchuk/amrex_dask) and found little support for working with the AMReX native format.
Maybe add a warning during the run initialization that says that the native format will run more quickly? Or like Lawrence is saying, maybe add some official demo scripts where AMReX native sampling data is being processed?
I think I also have a slight preference for default netcdf, but don't think I care too much?...but it does make me think I need to go in and make some asserts for the Spinner sampler and Radar sampler because they are dependent on netcdf output...hopefully they're the only samplers that need that...
pyamrex has seen a lot of development in the last year or so that makes it easy to load amrex native stuff quite simple. Native output for plt files and particles are supported by the major visualizers (visit, paraview, yt).
That being said, it sounds like the preference is for a slower default code and keeping the existing post-processing workflow. I will leave this open for several more days in case someone wants to chime in. I am adding a warning when using netcdf here: https://github.com/Exawind/amr-wind/pull/1053
If this is very important we could reopen #971 and fix the netcdf IO for sampling (which I believe is due to the root proc doing all the work). But my guess is that we will never get it close to the native amrex particle IO.
Having never used native
myself, I also have a slight preference towards keeping the default netcdf
. I also have my own tools setup for netcdf. However, whenever I can, I like to be explicit, so I always specify netcdf (honestly I didn't even know what the default was). I don't have strong feelings about this change, but if is to be merged in, I think a warning would be good to have.
I think changing to native as default is a better option. NetCDF requires parallel-aware compile of the netCDF libraries.
I think we got enough feedback. Going to close this. Hopefully we can move the post-processing at some point to use the native IO and then be able to run faster sims.
I know this issue is closed, but I came across unexpected behavior and wanted to note it somewhere. I am running a job where I have 345 PlaneSampler
and LidarSampler
instances saving out data at every timestep. I've been using netCDF, and I switched to AMReX native output to try an see a speedup. Unexpectedly, I'm seeing a 10x slowdown. Timesteps with netCDF typically take ~5 seconds (though I'm seeing them unexplainably spike), but timesteps with the native format are taking ~50 seconds.
Here's one sampling instance for reference:
# ------------- Sampling parameters for one imaginary turbine at (1120.0 m, 340.0 m, 0.0 m) -------------
sampling0000.output_format = native
sampling0000.output_frequency = 1
sampling0000.fields = velocity temperature tke
sampling0000.labels = inflow-bc-3D inflow-scan
# Inflow boundary condition plane, 3D
sampling0000.inflow-bc-3D.type = PlaneSampler
sampling0000.inflow-bc-3D.num_points = 64 64
sampling0000.inflow-bc-3D.origin = 740.0 20.0 0.0
sampling0000.inflow-bc-3D.axis1 = 0.0 630.0 0.0
sampling0000.inflow-bc-3D.axis2 = 0.0 0.0 630.0
sampling0000.inflow-scan.type = LidarSampler
sampling0000.inflow-scan.num_points = 83 # Range gate resolution of 12.0 m
sampling0000.inflow-scan.length = 996.0
sampling0000.inflow-scan.origin = 1120.0 340.0 120.0 # Lidar situated 3.0 m above hub height
sampling0000.inflow-scan.periodic = true
sampling0000.inflow-scan.time_table = 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 2.25 2.5 2.75 3.0 3.25 3.5 3.75 4.0 4.25 4.5 4.75 5.0 5.25 5.5 5.75 6.0 6.25 6.5 6.75 7.0 7.25 7.5 7.75 8.0 8.25 8.5 8.75 9.0 9.25 9.5 9.75 10.0 10.25 10.5 10.75 11.0 11.25 11.5 11.75 12.0 12.25 12.5 12.75 13.0 13.25 13.5 13.75 14.0 14.25 14.5 14.75 15.0 15.25 15.5 15.75 16.0 18.0
sampling0000.inflow-scan.azimuth_table = 195.0 195.0 195.0 195.0 193.0 193.0 193.0 193.0 191.0 191.0 191.0 191.0 189.0 189.0 189.0 189.0 187.0 187.0 187.0 187.0 185.0 185.0 185.0 185.0 183.0 183.0 183.0 183.0 181.0 181.0 181.0 181.0 179.0 179.0 179.0 179.0 177.0 177.0 177.0 177.0 175.0 175.0 175.0 175.0 173.0 173.0 173.0 173.0 171.0 171.0 171.0 171.0 169.0 169.0 169.0 169.0 167.0 167.0 167.0 167.0 165.0 165.0 165.0 165.0 0.0 0.0
sampling0000.inflow-scan.elevation_table = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 45.0 45.0
Interesting and disappointing. I would be interested in seeing the time step timing summary and the output of the profiler at the end of the run (assuming you turned on tiny profile).
Here are tiny profiles for a job that ran for 8 timesteps with netcdf
and 8 timesteps with native
. The netcdf
job didn't yet show any of the 40 seconds spikes that I see later in a run (mentioned in #1104).
It is interesting that one case is much more load balanced than the other. I don't know why ParticleContainer::RedistributeMPI()
would be that different... WriteBinaryParticleData
looks awful... It is unfortunate that we can't tell which sampler is causing this. The planes? The lines? The lidar? If you wouldn't mind posting the full input file here for reference and maybe we can have someone dig into this. You could try disabling the plane sampling and see if anything gets better. Same for the lidar.... basically try to isolate the culprit.
Happy to share the files! I'm going to keep hunting for a quick fix. I appreciate the interest :) The simulation starts from a checkpoint, but the same behavior should happen starting from t=0 seconds. If not, I can share the checkpoint file too.
job_output_filename.4369987.txt abl_netcdf.txt runfile32.txt abl_native.txt
And some extra context: here's some output from a netCDF job. As the simulation goes on, timesteps start to take longer, roughly starting at 5 sec per timestep and ending at roughly 100 sec per timestep. Maybe there is some sort of memory leak happening?
Summary
Make Sampling default to native output, they are much faster: https://github.com/Exawind/amr-wind/issues/971. If users don't have
sampling_label.output_format = netcdf
then they will see a change in behavior (faster runs but native amrex particle files vs netcdf files, this will affect post-processing sampling stuf). Tagging some users for discussion: @lawrenceccheung, @rthedin, @ndevelder, @neilmatula, who else?Pull request type
Please check the type of change introduced: