Automattic / mShots

Website Thumbnail/Snapshot Service
GNU General Public License v2.0
100 stars 21 forks source link

Add logic to the url-analyzer job #59

Closed jeroenpf closed 2 years ago

jeroenpf commented 2 years ago

This PR is part of a series (#53) of PRs and is preceded by #55.

Logic is added to extract colors from a website. It does so by trying to get background and link colors from the target URL and extracting dominant colors from the logo and favicon. This logic is ported from an earlier proof of concept we did.

Test instructions

Firstly, it is important that the version of Node on the host machine matches the version of Node inside the container. Currently, the version is 14.19.1 - if these versions don't match, npm install will result in broken packages if ran from the host machine. If matching the versions isn't possible, running npm clean-install inside the container will also work.

Run npm install to install new dependencies.

To test, simply hit the url-analyzer external endpoint with a URL, for example http://localhost:8000/url-analyzer?url=https://example.com

roo2 commented 2 years ago

I left some comments on https://github.com/Automattic/mShots/pull/56 which apply to this PR, it looks like they implement a lot of the same code

roo2 commented 2 years ago

node-canvas binaries aren't available for M1 macs', I tried installing them but then got issues where the application wouldn't start because it had been compiled for the wrong architecture. I tried running npm install --target_arch=arm64 --target_platform=linux etc and a few other things but haven't been able to run this from my machine.

vishnugopal commented 2 years ago

Think I'm running into the same issue as @roo2, when I try to run this, the worker restarts continuously:

[2022-04-08T10:14:27.248] [DEBUG] flog - The worker thread #5655 (pid:45410) is online.
[2022-04-08T10:14:27.260] [DEBUG] flog - worker thread #5655 (pid:45410) has disconnected.
[2022-04-08T10:14:27.260] [DEBUG] flog - worker thread #5655 (pid:45410) died (1). restarting.
[2022-04-08T10:14:27.260] [DEBUG] flog - sending "false" to forced reload parameter
[2022-04-08T10:14:27.262] [DEBUG] flog - The worker thread #5656 (pid:45417) is online.

and the result is always pending from the API:

{"http_status":200,"data":{"url":"https:\/\/example.com","status":"pending"}}
roo2 commented 2 years ago

@vishnugopal I was able to get the error message by adding the -C flag to the start-stop-daemon command in mshots_ctl.sh:72

start-stop-daemon -S -C -q -m -b --pidfile /var/run/mshots.pid --exec /usr/local/node/bin/node --chdir $INSTALL_DIR -- $ARGS

Then opening a shell in the docker container and running:

./mshots_ctl.sh stop
./mshots_ctl.sh start

You should then see the error, in my case:

Error: /opt/mshots/node_modules/canvas/build/Release/canvas.node: invalid ELF header

The ELF header thing would be because my canvas.node binary was built for the wrong CPU architecture. I haven't figured out how to make it build for the right architecture yet!

jeroenpf commented 2 years ago

@roo2 I got the invalid ELF header error when i ran npm install on my host machine with a node version that did not match the version inside the container. Are you sure that the versions match so we can be certain it is not caused by this?

Googling a bit, it seems there is indeed a problem with M1 using certain versions of the package and node. A solution that seems to be prevalent is: arch -arm64 brew install pkg-config cairo pango jpeg giflib librsvg - I don't know how this would translate to our scenario.

roo2 commented 2 years ago

@jeroenpf I think I tried that, installing the brew dependencies with arch -arm64 and then tried playing around with the npm install command's --target-arch param. I couldn't get past that error with the ELF headers, I probably did something wrong or didn't do a clean install when I should have https://stackoverflow.com/questions/24961623/installing-node-js-packages-for-different-architecture