ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
78 stars 57 forks source link

Prevent memory allocation during scaleAddOffset if it's unneeded #72

Closed ChrisPJohn closed 2 years ago

ChrisPJohn commented 2 years ago

I looked everywhere this is used and generally the result of the call is used to overwrite the PA that is provided as a parameter, so returning the same PA is not an issue.

During EDDGridFromNcFiles.testBigRequestSpeed this prevents the unnecesary allocation of multiple 37324800 item PrimitiveArrays which will reduce the need for garbage collection.

BobSimons commented 2 years ago

I realized scaleAddOffset is more complicated than I originally thought.

I think the point of the "always" in the original documentation for this method ("This returns a new (always) PrimitiveArray") isn't so much that the returned primitive array will always be a new PrimitiveArray. It is that the values in the source array won't be changed and, by always returning a new array, we are thus assured that the values in the source array won't ever be changed. As a general note: when I say something pointed in the documentation (like "always"), there's often a reason.

I think (but could easily be wrong) that this method is used in two main ways: 1) to modify the PrimitiveArray attribute associated with an attribute when generating the combinedAttributes (from the sourceAttributes + addAttributes). These are tiny PrimitiveArrays and we don't want the sourceAttributes to be changed. There is little cost to making a clone of the source array before calling this method. 2) to modify a data PrimitiveArray when fulfilling a request. These can be huge and it is fine if the source PrimitiveArray is changed.

I hope that makes sense. Let's keep the changes you made to this method. But, in light of this information, can you please review the uses of scaleAddOffset and add something like pa = (PrimitiveArray)pa.clone(); before calling scaleAddOffset for situations like (1) above? Or if you think that is already handled one way or another, let me know. Thanks.

I said yesterday that this job was basically low pressure. I realized there is one underlying pressure: not to make a mistake that leads to incorrect results. I doubt if anyone is double checking the results in the way that the unit tests do, so an error that we don't catch could go undetected for years. That would be really bad and destroy people's trust in ERDDAP. So I've tried hard not to make any errors like that.

I just thought I should say that even though it's probably obvious.

Best wishes.

On Mon, May 2, 2022 at 2:32 PM Chris John @.***> wrote:

I looked everywhere this is used and generally the result of the call is used to overwrite the PA that is provided as a parameter, so returning the same PA is not an issue.

During EDDGridFromNcFiles.testBigRequestSpeed this prevents the unnecesary allocation of multiple 37324800 item PrimitiveArrays which will reduce the need for garbage collection.

You can view, comment on, or merge this pull request online at:

https://github.com/BobSimons/erddap/pull/72 Commit Summary

File Changes

(1 file https://github.com/BobSimons/erddap/pull/72/files)

Patch Links:

— Reply to this email directly, view it on GitHub https://github.com/BobSimons/erddap/pull/72, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWOCBCDQHTSYG33T4FK3VIANSHANCNFSM5U4XVE5A . You are receiving this because you are subscribed to this thread.Message ID: @.***>