CBLRIT / ECG-Viewer

Opens and manipulates raw ECG data
GNU General Public License v2.0
35 stars 14 forks source link

Unable to open large files: Memory Inefficiency/Specification Change #5

Open MJE10 opened 2 years ago

MJE10 commented 2 years ago

Background: The program was initially intended to deal with small ECG data files, on the magnitude of 10-100 MB. These files work as intended. However, now it is being asked to process much larger files, on the order of 1-2 GB.

Problem: The program's current implementation utilizes at least as much RAM as the file size in order to perfectly graph the file, and likely uses several times as much. Therefore, the JVM runs out of memory before it finishes reading in the file. It will then fail silently in order to allow users to try a different operation.

Possible Solutions:

1) Increase the amount of RAM available to the JVM. By adding the -Xmx argument, you can allow the JVM to use more heap space than is usually allocated. Although it would certainly allow larger files, there is no great way to determine exactly how effective this method is. If you believe your files are right on the edge of what's possible, this might be worth a shot.

2) Process data channel by channel. This is one way to split up the work so that not all data would be loaded at once, but each image could still be formed from a single array using the graphing software. However, the downsides are that in order to apply the operations that the program offers, we would need to reload all of the data instead of working from live memory.

3) Decrease graphing precision. On large files, it is unlikely that researchers are looking for a point-to-point accurate graph on the timescales that these large files record. Therefore, significant improvements could be made by quantizing the data into a smaller number of chunks that could each be graphed with one point but cover a large number of data points in the original set.

4) Preprocessing. By creating smaller files with parts of the data, the program could divide its work into sections and do one section at a time. Could be slower to load like 2.

5) Refactor internal data structures. The program was not originally written to handle these large files efficiently, and changes could be made to improve how the data is manipulated.

At this time, there is no plan to implement any of the solutions mentioned here. Focus will be kept on making sure you can still open small parts of large files. Although most of the discussion concerning this issue will take place internally in CBLRIT, open-source comments and contributions are welcome.

MJE10 commented 2 years ago

See #6 for a related issue with reading subsets that will be resolved soon

MJE10 commented 2 years ago

6 is completed.