Memory consumption issue - Githubissues

ggalmazor / lt_downsampling_java8

Largest Triangle Three Buckets downsampling algorithm implementation for Java8

Other

22 stars 4 forks source link

Memory consumption issue #23

Closed capsuleman closed 9 months ago

capsuleman commented 9 months ago

Hi @ggalmazor! I submitted a PR/opened an issue with a bug in your library about a year ago. Thanks again for your library; it's been incredibly useful in my project. Now, I'm reaching out for a feature request I'd like to implement in your library 😊

Context

In my project, we utilize the down-sampling library to visualize large datasets (approximately 1,000,000 points) on a web interface that can handle only a few thousand points. These points are timeseries with two attributes: timestamp (Date) and measure (long). To integrate with your library, I need to extend this class with com.ggalmazor.ltdownsampling.Point, which uses two BigDecimal. These two additions cost at least 32b x 2 per point. About a month ago, we encountered a memory issue when multiple requests were made simultaneously. As a quick fix, we increased the application's memory allocation. However, we're now seeking a long-term solution that would be more memory-efficient.

Proposed Implementation

I'm considering creating a new class, lighter than Point, without BigDecimal attributes x and y. Instead, it would have only getters and setters performing on-the-fly conversion between the attributes and the BigDecimal value used by the algorithm. This implementation would introduce a new class without altering the core of the library. The Point class would still be available for backward compatibility.

Potential Blockers

Are there any potential blockers for developing this implementation? I'm thinking about:

Utilizing specific BigDecimal methods
Intensively using BigDecimal values, which might negatively impact algorithm performance when converting values on-the-fly multiple times
Any other considerations?

I'd be thrilled to update the library in this direction and open a PR for these changes. Do you agree with updating the library in this direction?

P.S.: Is there a chance of duplicated arrays during down-sampling (resulting in extra memory usage)?

ggalmazor commented 9 months ago

Hey @capsuleman! 👋

I'm thrilled about your proposal ❤️

Before having a chance to get into the code again to double-check, let me say that you are making really good points, and it sounds to me like we should probably deprecate Point in favor of the new value object class you are proposing. We could try Double arithmetic for any operation we're doing now as well to avoid having to lift numbers to BigDecimal. From the top of my head, we shouldn't lose anything important.

Let me check the code and get back to you asap

🙌

ggalmazor commented 9 months ago

It looks like it's totally doable. Check my spike branch at https://github.com/ggalmazor/lt_downsampling_java8/tree/spike/migrate_BigDecimal_to_double

I'd love it if you could create a PR with these changes. After a second thought, we might not need to deprecate Point. We might just overload the constructor/factories to transform BigDecimal values to Double. We could also explore downgrading the type requirements to Number, but I'm not sure if that's going to work with BigDecimal.

capsuleman commented 9 months ago

Hey @ggalmazor! 👋 I did not see you reply sorry! Great work for your migration, that so cool! Maybe you can create a PR for the changes you added 😄 I should open a PR with the changes I mentioned. Point should not be deprecated. I was wondering about creating an abstract class AbstractPoint with these getter / setter methods. Point would implement this abstract class.

ggalmazor commented 9 months ago

Alright, that makes sense, @capsuleman

Regarding the AbstractPoint, let me get back to you on that. I normally would like to avoid using an abstract class if the goal is to reuse code.

I'll also create a PR with my spike branch shortly

ggalmazor commented 9 months ago

Alright @capsuleman, this is a summary of what I've settled on:

Point is an interface now. Users only need to implement double getX and double getY()
There's a DoublePoint implements Point class now that users can use if they want, or build their Point implementation.
- DoublePoint is internally used by the lib for all computations but it doesn't go beyond that.

I've created a PR with this at https://github.com/ggalmazor/lt_downsampling_java8/pull/24, and I've invited you to the repo so that I can tag you as a reviewer (only if you want, of course). We can discuss details about the approach there.

When this PR gets merged I'll release this as 0.1.0 (I think it deserves a MINOR version bump since it's basically a breaking change.

ggalmazor commented 9 months ago

P.S.: Is there a chance of duplicated arrays during down-sampling (resulting in extra memory usage)?

I've seen Stream doing strange things in the past. I can't be sure, to be honest, and this would require some profiling research, which I can't address at this time.

Maybe we can come back to this topic once we get the Double port in place.