cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot
https://api.cofacts.tw
MIT License
110 stars 26 forks source link

Use buffer instead of cloned stream #279

Closed MrOrz closed 2 years ago

MrOrz commented 2 years ago

Current implementation of CreateMediaArticle will halt on file larger than a certain size.

It is halt because await file.clone().buffer() never resolves.

The root cause is that when we call await file.clone().buffer(), it reads the cloned stream and put its data in an internal buffer so that the original stream can still consume data afterwards. The read operation halts when the internal buffer is full.

Related issue:

Analysis

In the current implementation we tried to use stream so that we don't need to load the entire file in memory nor in file system.

However, image-hash package requires a buffer as its input. Even if we provide a file path to it, it will read the entire file into memory (then send to JPEG decoder) anyway. Therefore, it is inevitable that we put the entire file in memory.

Proposed fix

This PR downloads the entire file from mediaUrl into a buffer and then pass to image-hash.

We set a max limit for the downloaded file (5MB for now) so that the API server will not run out of memory when someone send us a super large file. If a file bigger than the limit is provided, CreateMediaArticle will fail.

coveralls commented 2 years ago

Coverage Status

Coverage decreased (-0.06%) to 87.473% when pulling f6cac742a6e49947acb78e795d3fe8b271a55b26 on use-buffer into e0510ecbc74c7737663340ca8231664149005454 on fix-build-by-upgrade.