biigle / reports

:m: BIIGLE module to generate reports for projects and volumes
GNU General Public License v3.0
1 stars 2 forks source link

Split full report if one file would have too many rows #66

Open mzur opened 4 years ago

mzur commented 4 years ago

Reject a full report request if it should contain more than 50000 annotations for a single volume. Even though the full report Python script is now more memory efficient, such a report can generate extremely large temporary files (>1.5 GB for a volume with 80000 annotations) and can contain more than the number of rows that can be handled by Excel or Calc (if freehand polygons are used).

mzur commented 2 years ago

Even better would be a dynamic limit. Find out what the maximum number of rows is that Excel/Calc can handle. Then split the report into multiple files if the number of rows in a single file would be too large.

dlangenk commented 2 years ago

According to Microsoft Support it is 1,048,576 rows by 16,384 columns

mzur commented 2 years ago

Your reference suggests that we could split the report into multiple worksheets instead of multiple files. This would be much easier. However, I think we do the worksheet split for some other cases, too (split by label tree?). I can't recall exactly.

mzur commented 2 years ago

There is no easy fix for this. This could be solved in three different ways but all are not straight forward:

  1. Split the XLSX in different files: There is no concept of a single report that consists of multiple files, so this would require significant work.
  2. Use multiple worksheets: Worksheets are already used if the report should be split by label tree or user.
  3. Deny request if report would be too big: "Too big" depends on the number of annotations and on the number of annotation coordinates. A report could contain 1M point annotations or only a few thousand freehand polygon annotations, so a hard limit for the number of annotations does not really make sense. Validation of a request that checks the number of annotation coordinates would be quite slow and/or complex, I think (count the commas in the points column?).
mzur commented 1 year ago

Another idea: Change the report to contain the array of coordinates in a single cell (like the CSV report). Offer a checkbox that makes the old behavior opt-in for backwards compatibility. Communicate this to the users.