Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

Error in makeSummaryDepth with large capture sets #99

Closed bkohrn closed 1 year ago

bkohrn commented 3 years ago

We have observed errors resembling the following with large capture sets:

/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/Image.py:2850: DecompressionBombWarning: Image size (177263376 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  DecompressionBombWarning,
Maximum supported image dimension is 65500 pixels
Traceback (most recent call last):
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/ImageFile.py", line 510, in _save
    fh = fp.fileno()
io.UnsupportedOperation: fileno
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/kohrnb/bioinformatics/Duplex-Seq-Pipeline-v2.0.0/scripts/plotSummaryDepth.py", line 15, in <module>
    myImages[0].save(f'{o.config}.summaryDepth.pdf', save_all=True, append_images=myImages[1:])
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/Image.py", line 2164, in save
    save_handler(self, fp, filename)
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/PdfImagePlugin.py", line 41, in _save_all
    _save(im, fp, filename, save_all=True)
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/PdfImagePlugin.py", line 172, in _save
    Image.SAVE["JPEG"](im, op, filename)
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/JpegImagePlugin.py", line 761, in _save
    ImageFile._save(im, fp, [("jpeg", (0, 0) + im.size, 0, rawmode)], bufsize)
  File "/home/kohrnb/miniconda3/envs/DS_full_v2_dev/lib/python3.6/site-packages/PIL/ImageFile.py", line 529, in _save
    raise OSError(f"encoder error {s} when writing image file") from exc
OSError: encoder error -2 when writing image file

(Note: this error was produced using an environment from the development version of the pipeline, but the underlying program was from the production version of the pipeline)

We have determined that this error arises due to limitations in the method the Python Image Library (PIL) uses to write images to PDF files; namely that the PIL encodes images to PDF using JPEG codices, which have limited maximum dimensions. In this case, the capture set in question had 106 regions defined in the BED file, resulting in a height of approximately 75,119 pixels (well above the programmed limit of 65,500). In practice, this means that the maximum number of regions that can be accommodated while still being able to make a summary depth file is about 90.

This issue doesn't affect any other portion of the pipeline, just the summary depth file creation, so everything else should finish properly. We have a solution to this issue, but this solution cannot be implemented in v2.X.X pipelines because it requires the addition of a new module in the run environment. For the time being, consider the maximum number of regions defined in the bed file as 90 regions.

This issue has already been fixed in the development version that is leading towards v3.0.0.