Mailtrain-org / mailtrain

Self hosted newsletter app
GNU General Public License v3.0
5.53k stars 692 forks source link

Pictures take long to load #1159

Closed skeleton55 closed 8 months ago

skeleton55 commented 3 years ago

Just updated to mailtrain Build 2021-05-25-0915 from the V2 beta and the pictures in the newsletters take very long to load. The smaller pictures that are below 200px width load nearly instantly but pictures larger than that take 5+ seconds to load. Nearly all of that time is spent receiving(even though the picture is less than 400KB). image Looking at the server with strace as the image is being sent it seems that mailtrain sends nearly everything and then just waits for 5 seconds before sending the last packet. Any idea what could be causing this?

Thanks in advance.

bures commented 3 years ago

When the image is downloaded first, it is resized to the correct width. Isn't it this case?

skeleton55 commented 3 years ago

The images are downloaded with the correct width and changing the width of the larger images by changing the params value in the url makes them load fast. Conversely, changing the params of the smaller images to make them large makes them load slow. Maybe a cache issue?

bures commented 3 years ago

Could you share a URL of an image that takes too long to load?

skeleton55 commented 3 years ago

For example this image: https://lists.lserv.at/mosaico/img?src=https%3A%2F%2Flists.lserv.at%2Ffiles%2Fcampaign%2Ffile%2F36%2F551137036d2cc6478898250a1236fb2d&method=resize&params=500,null

skeleton55 commented 3 years ago

It seems that this issue is related to the file cache. The file cache in the db is deleted right after it is created but if we comment out line 155 in server/lib/file-cache.js (starts with knex('file_cache')...) the images take long to load only once, then they are reasonably fast.

bures commented 3 years ago

This is strange. Could you please check the value of err that the destroy function takes as a parameter? Simply put console.log(err) on line server/lib/file-cache.js:150.

skeleton55 commented 3 years ago

console.log(err) is null it seems that the whole 5 seconds come from file-cache.js:103 changing setTimeout(callback, 5000)), to some other value changes how long it takes the image to load the first time. We do not yet know why this happens but playing around with the timeout we could see that the temp files in /tmp/ (e.g. "DEBUG - tmpFilePath - /tmp/tmp-1720uYNJ3URLBq02") are created but stay at 0 bytes for the whole duration until the timeout is reached. The Destroy function is probably called because we reach the timeout. Furthermore the entries in the file_cache table seem to be skipping 2 IDs for each entry: | 6212 | mosaico-images | 6215 | mosaico-images | 6218 | mosaico-images | 6221 | mosaico-images

fredluhamaa commented 2 years ago

I have exact same issue on fresh install. Initial thought issue is with ImageMagick, updated to ImageMagick 7.1.0-32 still no luck. Seems that cached images are not used after resizing, I see in cache folder many entries.

Any suggestions how to debug?

talheim-it commented 2 years ago

@fredluhamaa deployment method you used?

fredluhamaa commented 2 years ago

Manual install through git on Centos 7. nginx - v1.21.6 as proxy node - v14.19.1 npm - 6.14.17

Everything else runs perfectly, just images are slow to render. Weird thing that in log it's "fast": node[29312]: info HTTP GET /mosaico/img?src=https%3A%2F%2Fnewsletter.exampledomain.com%2Ffiles%2Fcampaign%2Ffile%2F4%2Ffd51e6a20d2736ca169dfb8efe6dff...1.761 ms

talheim-it commented 2 years ago

Ok, can you tell me how you see it? Maybe I can check it on my ubuntu server where I have running my instances.

fredluhamaa commented 2 years ago

From service status "log" and chrome dev tool timing.

mailtrain-status mailtrain-chrome-render

skeleton55 commented 2 years ago

In our case: we couldn't figure out in the end why exactly this happens, so we just use the workaround described above : comment out line 155 in mailtrain/server/lib/file-cache.js(the one that starts with: knex(file_cache').where...) and change setTimeout(callback, 5000) on line 103 in the same file to a lower value.

arqtgn-informatica commented 2 years ago

Same happening in our case, with both docker and installation. The problem solved when we use the same size of the image of the template. I think that if we use the 570px width of the template all goes well, but if we put a 1000px or more it have to resize everytime and gets frozen for 5 seconds... sometimes goes well and sometimes not...

gerhardsletten commented 1 year ago

We have also experienced this, and while looking into the issue, there are maybe several issues with server/lib/file-cache.js.

@bures Let me know if I should make a PR to fix these issues

(1) If an image is not served from cache it will be delivered as a stream, and for some reason ensureFileStream() has a 5000ms timeout which can be observed as the first part of the image is send, then you will wait for 5 sec before the image is fully loaded. This will depend of the image-size and whether the whole images can be delivered at once or if it is broken up in chunks. But the connection will probably always last for at least 5 sec.

Did try to set the timeout = 0, and it looks like delivery works as it should.

const ensureFileStream = callback => {
    if (!fileStream) {
        tmpName().then(tmp => {
            tmpFilePath = tmp;
            fileStream = fs.createWriteStream(tmpFilePath);
            setTimeout(callback, 5000);
        })
    } else {
        callback();
    }
};

(2) Newly cached images is immediately removed from db. If not found in cache it is stored here:

await knex.transaction(async tx => {
    const existingFileEntry = await knex('file_cache').where('type', typeId).where('key', key).first();

    if (!existingFileEntry) {
        const ids = await tx('file_cache').insert({type: typeId, key, mimetype: res.getHeader('Content-Type'), size: fileSize});
        await fs.moveAsync(tmpFilePath, getLocalFileName(ids[0]), {});
        mayNeedPruning = true;
    } else {
        await fs.unlinkAsync(tmpFilePath);
    }
});

But later in the destroy() fn the same image in destroyed, resulting in all images served as stream/chunked.

destroy(err, callback) {
    res.destroy(err);

    if (fileStream) {
        fileStream.destroy(err);
        fs.unlink(tmpFilePath, () => {
            knex('file_cache').where('type', typeId).where('key', key).del().then(()=> callback());
        });
    } else {
        callback();
    }
}

(3) The pruneCache() fn are removing the latest cached files, since the query is ordered by id, descending . It probably should start in the beginning of the table instead, so the least used files is removed first.

Edit: On later investigation this behavior is correct, because pruneCache() will end up removing cache entries at the end of iterations, and this will be the entries with lower id's

const pruneCache = async() => {
    if (mayNeedPruning) {
        try {
            const maxSize = cacheConfig.maxSize * 1048576;

            let lastId = null;
            let cumulativeSize = 0;

            while (true) {
                let entriesQry = knex('file_cache').where('type', typeId).orderBy('id', 'desc').limit(pruneBatchSize);
                if (lastId) {
                    entriesQry = entriesQry.where('id', '<', lastId);
                }

                const entries = await entriesQry;

                if (entries.length > 0) {
                    for (const entry of entries) {
                        cumulativeSize += entry.size;
                        if (cumulativeSize > maxSize) {
                            await fs.unlinkAsync(getLocalFileName(entry.id));
                            await knex('file_cache').where('id', entry.id).del();
                        }

                        lastId = entry.id;
                    }
                } else {
                    break;
                }
            }
        } catch (err) {
            log.error('FileCache', err);
        }

        mayNeedPruning = false;
    }
};

You can inspect the size of the cache in mysql-cli with this query:

select sum(size/1048576) from file_cache;
+-------------------+
| sum(size/1048576) |
+-------------------+
|          346.4859 |
+-------------------+

If you are close to the 1024mb default limit, you should increase the maxSize below mosaico > images

mosaico:
  # Installed templates
  fsTemplates:
  - key: versafix-1
    label: Versafix One
  # Inject custom scripts
  # customscripts:
  # - /mosaico/custom/my-mosaico-plugin.js
  fileCache:
    blockThumbnails:
      maxSize: 100 # megabytes
      pruneInterval: 60 # seconds
    images:
      maxSize: 1024 # megabytes <-- (increase this)
      pruneInterval: 60 # seconds
bures commented 1 year ago

@gerhardsletten, yes please. If you have a fix, make a PR.

gerhardsletten commented 1 year ago

@bures Added here #1309

talheim-it commented 8 months ago

We are going to start with the development and testing of mailtrain v3 in the next weeks.

You are welcome to help us with the testing as soon as the first release candidate is available.