Make Sampling default to native output

marchdf commented 1 month ago

Summary

Make Sampling default to native output, they are much faster: https://github.com/Exawind/amr-wind/issues/971. If users don't have sampling_label.output_format = netcdf then they will see a change in behavior (faster runs but native amrex particle files vs netcdf files, this will affect post-processing sampling stuf). Tagging some users for discussion: @lawrenceccheung, @rthedin, @ndevelder, @neilmatula, who else?

Pull request type

Please check the type of change introduced:

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, renaming)
[ ] Refactoring (no functional changes, no api changes)
[ ] Build related changes
[ ] Documentation content changes
[ ] Other (please describe):

lawrenceccheung commented 1 month ago

I think this might affect a lot of users if they're not aware of this change. I do like the native format because it should be a pretty good speed-up over netcdf. But one downside is that we've got a lot of utilities set up for netcdf sampling postprocessing (like this one here). To take full advantage of this we should create a similar set of python utilities for the native format, and not necessarily rely on Paraview for loading and visualization.

Also tagging @rybchuk.

Lawrence

marchdf commented 1 month ago

Yeah... I hate breaking workflows. I hate defaults that make the code slow. So I feel stuck.

rybchuk commented 1 month ago

I would advocate for keeping the default sampling data format to netCDF. If a user is saving out data, presumably they want to analyze that data. There is lots of support for analyzing netCDF data. I went on a side quest a year or two ago where I had to mess with AMReX native checkpoint files (https://github.com/rybchuk/amrex_dask) and found little support for working with the AMReX native format.

Maybe add a warning during the run initialization that says that the native format will run more quickly? Or like Lawrence is saying, maybe add some official demo scripts where AMReX native sampling data is being processed?

ndevelder commented 1 month ago

I think I also have a slight preference for default netcdf, but don't think I care too much?...but it does make me think I need to go in and make some asserts for the Spinner sampler and Radar sampler because they are dependent on netcdf output...hopefully they're the only samplers that need that...

marchdf commented 1 month ago

pyamrex has seen a lot of development in the last year or so that makes it easy to load amrex native stuff quite simple. Native output for plt files and particles are supported by the major visualizers (visit, paraview, yt).

That being said, it sounds like the preference is for a slower default code and keeping the existing post-processing workflow. I will leave this open for several more days in case someone wants to chime in. I am adding a warning when using netcdf here: https://github.com/Exawind/amr-wind/pull/1053

If this is very important we could reopen #971 and fix the netcdf IO for sampling (which I believe is due to the root proc doing all the work). But my guess is that we will never get it close to the native amrex particle IO.

rthedin commented 1 month ago

Having never used native myself, I also have a slight preference towards keeping the default netcdf. I also have my own tools setup for netcdf. However, whenever I can, I like to be explicit, so I always specify netcdf (honestly I didn't even know what the default was). I don't have strong feelings about this change, but if is to be merged in, I think a warning would be good to have.

hgopalan commented 1 month ago

I think changing to native as default is a better option. NetCDF requires parallel-aware compile of the netCDF libraries.

marchdf commented 1 month ago

I think we got enough feedback. Going to close this. Hopefully we can move the post-processing at some point to use the native IO and then be able to run faster sims.

rybchuk commented 2 weeks ago

I know this issue is closed, but I came across unexpected behavior and wanted to note it somewhere. I am running a job where I have 345 PlaneSampler and LidarSampler instances saving out data at every timestep. I've been using netCDF, and I switched to AMReX native output to try an see a speedup. Unexpectedly, I'm seeing a 10x slowdown. Timesteps with netCDF typically take ~5 seconds (though I'm seeing them unexplainably spike), but timesteps with the native format are taking ~50 seconds.

Here's one sampling instance for reference:

# ------------- Sampling parameters for one imaginary turbine at (1120.0 m, 340.0 m, 0.0 m) -------------
sampling0000.output_format                   = native
sampling0000.output_frequency                = 1
sampling0000.fields                          = velocity temperature tke
sampling0000.labels                          = inflow-bc-3D inflow-scan

# Inflow boundary condition plane, 3D
sampling0000.inflow-bc-3D.type               = PlaneSampler
sampling0000.inflow-bc-3D.num_points         = 64 64
sampling0000.inflow-bc-3D.origin             = 740.0 20.0 0.0
sampling0000.inflow-bc-3D.axis1              = 0.0 630.0 0.0
sampling0000.inflow-bc-3D.axis2              = 0.0 0.0 630.0

sampling0000.inflow-scan.type                = LidarSampler
sampling0000.inflow-scan.num_points          = 83    # Range gate resolution of 12.0 m
sampling0000.inflow-scan.length              = 996.0
sampling0000.inflow-scan.origin              = 1120.0 340.0 120.0  # Lidar situated 3.0 m above hub height
sampling0000.inflow-scan.periodic            = true
sampling0000.inflow-scan.time_table          = 0.0   0.25  0.5   0.75  1.0   1.25  1.5   1.75  2.0   2.25  2.5   2.75  3.0   3.25  3.5   3.75  4.0   4.25  4.5   4.75  5.0   5.25  5.5   5.75  6.0   6.25  6.5   6.75  7.0   7.25  7.5   7.75  8.0   8.25  8.5   8.75  9.0   9.25  9.5   9.75  10.0  10.25 10.5  10.75 11.0  11.25 11.5  11.75 12.0  12.25 12.5  12.75 13.0  13.25 13.5  13.75 14.0  14.25 14.5  14.75 15.0  15.25 15.5  15.75 16.0  18.0 
sampling0000.inflow-scan.azimuth_table       = 195.0 195.0 195.0 195.0 193.0 193.0 193.0 193.0 191.0 191.0 191.0 191.0 189.0 189.0 189.0 189.0 187.0 187.0 187.0 187.0 185.0 185.0 185.0 185.0 183.0 183.0 183.0 183.0 181.0 181.0 181.0 181.0 179.0 179.0 179.0 179.0 177.0 177.0 177.0 177.0 175.0 175.0 175.0 175.0 173.0 173.0 173.0 173.0 171.0 171.0 171.0 171.0 169.0 169.0 169.0 169.0 167.0 167.0 167.0 167.0 165.0 165.0 165.0 165.0 0.0   0.0  
sampling0000.inflow-scan.elevation_table     = 0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   45.0  45.0

marchdf commented 1 week ago

Interesting and disappointing. I would be interested in seeing the time step timing summary and the output of the profiler at the end of the run (assuming you turned on tiny profile).

rybchuk commented 1 week ago

Here are tiny profiles for a job that ran for 8 timesteps with netcdf and 8 timesteps with native. The netcdf job didn't yet show any of the 40 seconds spikes that I see later in a run (mentioned in #1104).

netcdf_profile.txt native_profile.txt

marchdf commented 1 week ago

It is interesting that one case is much more load balanced than the other. I don't know why ParticleContainer::RedistributeMPI() would be that different... WriteBinaryParticleData looks awful... It is unfortunate that we can't tell which sampler is causing this. The planes? The lines? The lidar? If you wouldn't mind posting the full input file here for reference and maybe we can have someone dig into this. You could try disabling the plane sampling and see if anything gets better. Same for the lidar.... basically try to isolate the culprit.

rybchuk commented 1 week ago

Happy to share the files! I'm going to keep hunting for a quick fix. I appreciate the interest :) The simulation starts from a checkpoint, but the same behavior should happen starting from t=0 seconds. If not, I can share the checkpoint file too.

job_output_filename.4369987.txt abl_netcdf.txt runfile32.txt abl_native.txt

rybchuk commented 1 week ago

And some extra context: here's some output from a netCDF job. As the simulation goes on, timesteps start to take longer, roughly starting at 5 sec per timestep and ending at roughly 100 sec per timestep. Maybe there is some sort of memory leak happening?

job_output_filename.4368588.txt

Exawind / amr-wind

Make Sampling default to native output #1052

Summary

Pull request type