Closed AnyByte closed 4 years ago
I've encountered similar issues upgrading my personal setup to Beta 2 - It's likely due to the new UID/GID changes introduced for security. Try changing up the LRR_UID/GID environment variables to 1000 or 0.
There might still be some demons lurking though, I'm still getting a few errors using the web uploader on Docker for this version.
I tried setting
environment:
- LRR_UID=0
- LRR_GID=0
in docker-compose.yml
but when I upload something its chown is still 9001:9001
Yeah, UID/GID to 0 actually doesn't properly elevate the app to root; I've pushed something to fix this alongside some extra chmods to the database and thumbnail folder.
LANraragi Beta 2 - WSL This is probably the most common error I get when especially when running a fresh install of LANraragi.
DEBG 'lanraragi' stderr output:
Could not connect to Redis server at 127.0.0.1:6379: Address in use at /home/koyomi/lanraragi/lib/../lib/LANraragi/Model/Config.pm line 38.
...propagated at /usr/local/share/perl5/site_perl/Redis.pm line 606.
I think? it's the cause of the shinobu worker dying/lots of mojolicious errors.
I've also been getting a similar error to the OP when the Shinobu worker dies even though I'm not using the web-upload.
DEBG 'lanraragi' stdout output:
860/s)
[Mojolicious] GET "/api/thumbnail" (2a3724ad)
[Mojolicious] Routing to controller "LANraragi::Controller::Api" and action "serve_thumbnail"
[Mojolicious] 200 OK (0.00109s, 917.432/s)
[Mojolicious] GET "/api/thumbnail" (34241689)
[Mojolicious] Routing to controller "LANraragi::Controller::Api" and action "serve_thumbnail"
[Mojolicious] 200 OK (0.001196s, 836.120/s)
[Mojolicious] GET "/api/thumbnail" (798ba584)
[Mojolicious] Routing to controller "LANraragi::Controller::Api" and action "serve_thumbnail"
[Mojolicious] 200 OK (0.001222s, 818.330/s)
[Mojolicious] GET "/api/thumbnail" (4c01e1c4)
[Mojolicious] Routing to controller "LANraragi::Controller::Api" and action "serve_thumbnail"
[Mojolicious] 200 OK (0.001131s, 884.174/s)
[Mojolicious] GET "/api/thumbnail" (2fa82c48)
[Mojolicious] Routing to controller "LANraragi::Controller::Api" and action "serve_thumbnail"
[Mojolicious] 200 OK (0.001001s, 999.000/s)
[Shinobu Boot] [info] Shinobu Background Worker terminated. (PID was 54)
[Mojolicious] GET "/logs"
I'll be sure to update to the nightlies.
Try changing up the LRR_UID/GID environment variables to 1000 or 0.
Tried on latest Nightly with fixed chmods but no luck. I still think that somethings up with filename parsing cause Im 24/7 getting these errors and I dont know where they come from:
2019-06-06 19:55:47,469 DEBG 'lanraragi' stderr output:
^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE .+\.(png|jpg|gif|bmp|jpeg|webp|PNG|JPG|GIF|BMP|JPEG|WEBP)$/ at /home/koyomi/lanraragi/lib/../lib/LANraragi/Utils/Generic.pm line 34.
2019-06-06 20:04:39,664 DEBG 'lanraragi' stderr output:
'.' and '.' are identical (not copied) at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Archive.pm line 52.
2019-06-06 23:36:35,757 DEBG 'lanraragi' stderr output:
Use of uninitialized value $1 in concatenation (.) or string at /home/koyomi/lanraragi/script/../lib/LANraragi/Controller/Reader.pm line 37.
2019-06-06 20:03:11,014 DEBG 'lanraragi' stderr output:
Odd number of elements in anonymous hash at /home/koyomi/lanraragi/script/../lib/LANraragi/Controller/Api.pm line 75.
I think you should test worker with some extreemly bloated filenames with apostrophes, questionmarks, asterisks, quotes, hieroglyphs, special symbols and all that stuff. Heres an example of this filename вН╦ъ╓н╟Щим.zip
.
Or you can try to make it so it would rename all uploaded archives to GUIDs that would be stored in the database.
Something like this? Not to be rude but this time around I'm fairly sure it's not a filename encoding problem.
I don't want to go the file renaming path as this prevents the software from being used with existing collections. (Otherwise I'd probably have done it a long time ago and spared myself dealing with filename encodings)
I still think your errors are coming from the Redis DB not functioning properly. Can you look at your content folder and do a ls -l | grep database.rdb
? Although UID/GID to 0 should've fixed things already if it really was a permission issue.
The errors above are warnings that don't affect the app, although they do make for some misleading white noise; I'll try to get them out of the way in a future release.
Something like this? Not to be rude but this time around I'm fairly sure it's not a filename encoding problem.
Sorry for being too persuasive, I just want for it to work goddamit xd
Well I tried running it with LRR_UID: 0
and LRR_GID: 0
and in fact it does create database.rdb
with root
ownership but the problem still remains.
After that I tried mounting redis.conf
and changing parameters like daemonize
and protected mode
which I found to be suggested to be changed after 5 minutes spent on googling and still the same problem.
Then I created redis container and modified lrr.conf
so it would connect to that container and it did create a database there and all that but after I uploaded some zips the error popped out again .
I even tried redis-cli monitor
and everything looked fine from there (at least it seemed like to me), redis didn't seem to crash or something like that.
Maybe the problem lies in Pearl library or reconnect timeout? Or maybe Im just stupid and missed something, who knows, never worked with Redis or Perl.
It's pretty strange at this point, UID/GID to 0 is basically the same behavior as in previous Docker releases. If you rollback to an earlier version, does it work properly?
Since it only becomes unavailable(not a crash seemingly) after we try to write to it, I thought it was permission problems on the database, but that doesn't seem to be the case. You don't get anything out of the ordinary from the monitor
command?
Address not available
is an uncommon error as well, you might have some luck switching the 127.0.0.1
part of the configured address to something a bit more broad like localhost
or [::1]
for ipv6.
@AnyByte Any news on this?
@AnyByte Any news on this?
Im currently a bit busy, I will check in a couple of days.
Well I cloned the repository and built docker image from dev branch myself on my windows PC and everything worked flawlessly, even overall responsiveness became much better no matter how hard I tried to crash it with constant uploading and clicking on everything.
So I exported this image and imported it on my server running ESXI with CentOS 7 VM and saw the same Address not available
error almost right away. Also I was clearing thumbs and database.rdb every time so that I would know for sure that its not corrupted databse which causes this.
You don't get anything out of the ordinary from the
monitor
command?
Nope, nothing out of the ordinary. Redis doesnt even crash or restart.
Sounds like a very strange issue to me. Maybe its something to do with how CentOS file system works with permissions outside docker or is it because I have SSD on my PC and HDD on CentOS VM.
I will try to tinker with it furthermore, just letting you know my experience so far with this issue in case you might come up with what may lead to it.
I'm banking on weird file permissions at this point, yeah. Do old versions work properly?
Hi, I am new to LANraragi, and because I didn't want to use docker (I am using Proxmox which use LXC containers) I have created a VM from the docker file instructions, so I have an alpine linux VM that works the same way.
I am using the dev branch of the sources, as I have just cloned the git repository, and I have the same problem, the worker keeps beeing in the Kaput state.
I can provide any informations you need.
Have you duplicated the entire part with su-exec and the matching environment variables as well? I was suspecting permission problems but it doesn't seem to be the issue at hand.
Does your issue also stem from the Redis server dying? The dev
branch has a better check for it at the moment if you're not already on that.
Past that I admit the bug stumps me; It might be an issue with supervisord but I'm just throwing guesses at this point. 😥
No, in fact, as at first I did forgot to define the LRR_NETWORK var env, the command line with su-exec didn't worked as expected. After some digging, I found the problem about this var and I have created a little quick and dirty run script :
export LRR_NETWORK="http://*:3000"
cd /home/koyomi/lanraragi
supervisord --nodaemon --configuration ./tools/DockerSetup/supervisord.conf
Because I am not in docker, I can't use the entrypoint as-is to be able to start the software. So, because I am still not done setting a valid service script file, I am executing this script, directly as the koyomi user, in a screen terminal to be able to detach it.
This is wrong, I know, but so far it works.
The reported version is 0.6.0-beta.2 and I am currently using the c8d5376 commit.
I have grepped on the redis word in the logs folder without anything very interesting. The last startup of redis was 2 days ago, the 28th, and apart a warning about THP, nothing more about redis dying or anything else.
Please note, that in my tentative to "simulate" the docker environment, I may have missed some env var, I didn't try to search what would be the right bash script equivalent for the command ENV EV_EXTRA_DEFS -DEV_NO_ATFORK
, the other ENV commands are quite straightforward, as it is a simple list of var affectation : ENV LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 ENV LRR_UID=9001 LRR_GID=9001
but the EV_EXTRA_DEFS is not an = affectation, and I didn't try to know how docker will do in this case, so maybe I am missing something important here.
The equivalent command for the EV_EXTRA_DEFS variable is export EV_EXTRA_DEFS="-DEV_NO_ATFORK"
. It's apparently required by Mojo in a Docker context(not even sure about that), so it probably won't change much.
You're basically running close to a source install here, except it's jammed in an alpine VM.
Since everything is working save for the background worker, you can probably go ahead and enable debug mode too.
Do you get any kind of specific errors when starting the program, or is it just something non-specific like Shinobu Worker Terminated
?
OK, I will enable debug mode and come back if I see anything interessting.
But I have to tell this : to be able to post my previous reply, I had to check some things, and discovered that the LXC container was using the maximum swap memory space available (that is set to 500MB as default in proxmox) so I changed the memory setting to 2GB for both RAM and swap.
Since then, the worker is still running fine. But I did not add anymore file in the library, this could also means something.
And indeed when it happens the only log was Shinobu Worker Terminated.
I will add some files to see if it crash again and if not, will test again with less RAM
Damn, I never thought about RAM! It's weird that it doesn't show an explicit error about it, though. Maybe an implementation quirk in Proc::Background.
I tried to add something like 38 new archives in the content folder. And the Shinobu worker was alive since my last message. But the problem is that according to Proxmox, even if I have granted 2GB for both RAM and SWAP, the SWAP is not used, and the RAM not that much neither.
So it is a bit strange. I have restarted my container, with 1GB of RAM and 500MB of SWAP, and I am adding 185 archives for a total of 40GB of files, and will let you know if the worker crashed again. I am still in debug.
Here is a little progression :
Here you can see that during the analyzing phase of the 40GB that I have added (it is still running) the system use SWAP instead of RAM (I don't really understand why), for the moment the worker is the one that was started at boot, and is still running, I can see logs about new file detection.
Ok, so the test is done, and everything went soomthly, more than 40GB and 253 new title have been added.
The only thing that I can think of now if the fact there is a major difference on how I gave the file to LRR : at first, when the worker crashed, I used to upload via the web browser something like 50 / 100 files at once, with the page open to see the plugins progression. Now I am putting file directly in the content folder and use the fact that LRR is able to scan them itself by using inotify.
I will not be able to test this before at least 5 hours from now.
So my theory doesn't seems to be valid. I have tried to upload something like 200 archives, let the page open to see the auto plugin progression, then tried to upload another batch of file when the popup of library generation was displayed without any crash.
In fact, the worker don't want to crash anymore. I don't know why. Sorry.
I managed to reproduce this on my own setup and "might" know where it's coming from.
[LANraragi] [info] Terminating previous Shinobu Worker if it exists... (PID is 19)
2019-10-09 20:03:40,877 DEBG fd 9 closed, stopped monitoring <POutputDispatcher at 140252593785672 for <Subprocess at 140252593784808 with name redis in state RUNNING> (stdout)>
2019-10-09 20:03:40,877 DEBG fd 14 closed, stopped monitoring <POutputDispatcher at 140252593786104 for <Subprocess at 140252593784808 with name redis in state RUNNING> (stderr)>
2019-10-09 20:03:40,878 INFO exited: redis (terminated by SIGKILL; not expected)
2019-10-09 20:03:40,878 DEBG received SIGCHLD indicating a child quit
2019-10-09 20:03:43,881 DEBG 'lanraragi' stderr output:
Can't load application from file "/home/koyomi/lanraragi/script/lanraragi": Could not connect to Redis server at 127.0.0.1:6379: Connection refused at /home/koyomi/lanraragi/script/../lib/LANraragi/Model/Config.pm line 38.
...propagated at /usr/local/share/perl5/site_perl/Redis.pm line 613.
Compilation failed in require at (eval 85) line 1.
There's a feature in place to terminate the background worker's PID if it's still running upon a server restart; The PID is stored in a file so the next server process can pick it up.
It's very likely that on a container restart, the PID in this file can match the Redis PID, leading to LRR killing its own database. Brilliant.
I have doubts about whether this PID-killer feature is useful, but for now checking if the PID belongs to a perl
process should be enough to fix this bug.
Additionally, the previous conjecture about RAM was right on the money: When Redis RAM usage grows to half of what's available in the container, saves of the database will start failing due to fork() not being able to allocate enough RAM. (See https://github.com/docker-library/redis/issues/93#issuecomment-363587122)
LRR in Docker is configured to error out if the database can't be written to anymore.
Using the overcommit_memory
trick fixes this here for container installs.
(https://redis.io/topics/faq#background-saving-fails-with-a-fork-error-under-linux-even-if-i-have-a-lot-of-free-ram)
Sadly I can't set it myself for Docker users since it's a kernel parameter and therefore touches the entire system: Folks'll have to enter the command manually if they need it.
What can be done is to add a warning on start, and check the codebase for unclosed redis objects -- There seem to a be a few of them on archive extraction and building the filemap.
It's another story for Windows users, as we do have complete control over the kernel there. Not sure if the parameter really does anything on WSL1, however.
I can thank server-side search in a way since it stresses Redis much more now, making this error much easier to replicate. 👍
Another issue was due to hypnotoad's hot deploy feature on Docker container restarts: The previous hypnotoad.pid
file was left hanging in the container's filesystem, leading the server to believe a previous instance was still present.
Hypnotoad's hot deploy sends SIGUSR2 to the process marked in this pid file, which could also be either the redis instance or...ourselves.
Removing those rogue .pid files in the container's entrypoint seems to be the last piece of the puzzle.
LRR Version and OS LANraragi 0.6.0-BETA.2 Docker
Bug Details Every time I upload an archive, Shinobu Worker dies and I have to restart it manually every time for a new archive to appear in the list, but after some period of time it dies on its own.
I think its somehow related to hieroglyphs in the filenames and because of them thumbnail cache may be broken judging by this line
[Auto-Tagger] [info] Thumbnail hash invalid, regenerating.
but I might be wrong considering it also loses connection to Redis for some reason:Full Logs