CDAT / cdat

Community Data Analysis Tools
Other
174 stars 68 forks source link

Address cdms2 slow read from multiple netcdf files accessed through spanning xml #479

Open durack1 opened 10 years ago

durack1 commented 10 years ago

It appears cdms2 is taking a very long time (tens of minutes) to read a subset of data from multiple netcdf files containing a continuous time axis. In particular the issue is appearing with very high resolution (daily, 2700 x 3600) surface ocean fields.

The slow read also happens when addressing the netcdf file(s) directly

The issue is apparent on the GFDL computing systems and has also been replicated on a workstation physical disk so it's likely a cdms2 problem rather than anything hardware related.

doutriaux1 commented 10 years ago

i suspect the time dim to be the killer, will report here as soon as we found a solution or at least a reason why.

durack1 commented 10 years ago

With compressed (and potentially shuffled) netcdf4 files potentially coming down the pipe with CMIP6 it could be a good time to revisit how cdms2 is calling the netcdf C API, and if some compression/shuffling magic is better done by the library than Cdunif, then switching this to use the library.. If data reads can be sped up by reading across certain dimensions (so it appeared that reading all lons, and subsetting lats was 2x faster than the inverse) it would be great if a read call automagically did that for the user..

aashish24 commented 10 years ago

:+1: Would using parallel netcdf help?

doutriaux1 commented 10 years ago

I don't think so in this case, it would probably help a bit but not so much, most of the wasted time (I believe) is spent reading the time dimensions from each files and recomposing the time objects. Taking cdunif is pretty much out of question since it is what allows us for reading in multiple formats. But @aashish24 you're right we should definitely take advantage of // netcdf anyway.

durack1 commented 9 years ago

You folks might want to assign this as a 2.1 milestone to get if off the to-do list

aashish24 commented 9 years ago

Done

doutriaux1 commented 9 years ago

@durack1 it's not a fix but since our file are niceley ordered based on name, using -j option makes the re-reading fast.

durack1 commented 9 years ago

I'll take another look at this when I go and look at the GFDL data again, I do think that it's going to cause problems in the CMIP6+ timeframe..

doutriaux1 commented 9 years ago

@painter1 that is the bug I'm thinking is hitting us on rhea. Will take a look tomorrow

doutriaux1 commented 9 years ago

@durack1 are the files still around? @painter1 thread seems to indicate -j helps. Can't reproduce locally on a tiny example.

painter1 commented 9 years ago

When I looked at the -j option it didn't seem that it does much of anything relative to the default behavior, except that it disables the default behavior of linearizing time for very short time units. Maybe I missed something.

durack1 commented 9 years ago

@doutriaux1 yep - ocean:/export/doutriaux1/Paul we last discussed this back in June last year..

durack1 commented 9 years ago

@doutriaux1 can you assign the enhancement label please?

doutriaux1 commented 9 years ago

@dnadeau4 assigning this to you but let's work on it together

dnadeau4 commented 9 years ago

I think it affect using THREDDS as well. #1475 I would like to only read 1 time dimension at a time instead of loading everything in memory at once. Will start to investigate a solution.

durack1 commented 9 years ago

@dnadeau4 happy to run any tweaks you have over the test case, we have it pretty well reproduced..

dnadeau4 commented 8 years ago

I have made progress on this parallel issue. Everything is working well. In an parallel world, every node must open the file, create dimensions and create variables. Only slicing is different for each one.

Please run testing/cmds2/test_mpi_write_2.py on branch issue_479.

durack1 commented 8 years ago

@dnadeau4 are the tweaks on the branch above addressing the slow read times which was the focus of this original issue?

dnadeau4 commented 8 years ago

This is a write in parallel program. Nothing to do with the original issue.

The original issue will need a re-architecture of CDMS. CDMS read the entire array in memory. I would like to change the code just to return a empty handler and read slices of the array per user request. Of course if the user ask for the entire array it will still be slow, but it could be possible to read in parallel using multiple nodes. Something to think about.

durack1 commented 8 years ago

@dnadeau4 if there is no "fix" for this issue, aside from a rearchitecture of cdms2, then close this issue and add the large grid test to the suite for testing when cdms2 is updated..