Current implementation of CreateMediaArticle will halt on file larger than a certain size.
It is halt because await file.clone().buffer() never resolves.
The root cause is that when we call await file.clone().buffer(), it reads the cloned stream and put its data in an internal buffer so that the original stream can still consume data afterwards. The read operation halts when the internal buffer is full.
In the current implementation we tried to use stream so that we don't need to load the entire file in memory nor in file system.
However, image-hash package requires a buffer as its input. Even if we provide a file path to it, it will read the entire file into memory (then send to JPEG decoder) anyway. Therefore, it is inevitable that we put the entire file in memory.
Proposed fix
This PR downloads the entire file from mediaUrl into a buffer and then pass to image-hash.
We set a max limit for the downloaded file (5MB for now) so that the API server will not run out of memory when someone send us a super large file. If a file bigger than the limit is provided, CreateMediaArticle will fail.
Coverage decreased (-0.06%) to 87.473% when pulling f6cac742a6e49947acb78e795d3fe8b271a55b26 on use-buffer into e0510ecbc74c7737663340ca8231664149005454 on fix-build-by-upgrade.
Current implementation of
CreateMediaArticle
will halt on file larger than a certain size.It is halt because
await file.clone().buffer()
never resolves.The root cause is that when we call
await file.clone().buffer()
, it reads the cloned stream and put its data in an internal buffer so that the original stream can still consume data afterwards. The read operation halts when the internal buffer is full.Related issue:
Analysis
In the current implementation we tried to use stream so that we don't need to load the entire file in memory nor in file system.
However,
image-hash
package requires a buffer as its input. Even if we provide a file path to it, it will read the entire file into memory (then send to JPEG decoder) anyway. Therefore, it is inevitable that we put the entire file in memory.Proposed fix
This PR downloads the entire file from
mediaUrl
into a buffer and then pass to image-hash.We set a max limit for the downloaded file (5MB for now) so that the API server will not run out of memory when someone send us a super large file. If a file bigger than the limit is provided,
CreateMediaArticle
will fail.