Closed GoogleCodeExporter closed 9 years ago
Hi Jerry. Thanks for your issue report.
Before we continue I need to understand some more details about your problem.
When you detect that rar2fs is consuming 100% of the CPU, can you tell if it
happens at some specific operation towards the mounted folder? Is it always
after accessing the same file or is it completely random? Does the problem
happen also if you don't access any files?
Is this a single- or multi-core server? If it is a multi-core server, what
happens if you try the option --no-smp when starting rar2fs?
This is the first time I have had a report on this specific issue so I do not
have any clues yet to what could actually be the root cause. What you could try
(since you run 1.15.5 I assume you built rar2fs yourself) is to configure with
--enable-debug=2 and re-compile. Then run with the -f flag. This will log a lot
to the terminal from which rar2fs it is started so try to use a fairly large
history buffer. Then at least we can see what rar2fs is doing when CPU load sky
rockets.
Also, connect to the process using gdb is an option and check current stack.
This provides that the server can respond to anything when CPU load is so high.
Since CPU load is stuck at 100% I suspect a loop of some kind. Could be that
PMS is triggering a case I have not yet been able to test. It could also be
something in FUSE or libunrar :(
Did you compile against unrar source 4.2.4?
Hans
Original comment by hans.bec...@gmail.com
on 17 Jan 2013 at 4:48
One other thing. How is PMS interacting with the mounted folder? Does it try to
write something to it? I have never tested PMS myself so any information you
can give me is valuable. How does the PLEX client interact with the server? I
guess it is using some network protocol for this, do you know which one?
And, when you see the problem, how many threads have actually been created by
rar2fs? You can install eg. htop which will show you that in a nice way.
Is there something in /var/log/messages or dmesg that can hint towards
something?
Hans
Original comment by hans.bec...@gmail.com
on 17 Jan 2013 at 5:11
Hi, thanks for your quick response!
i will try to implement your suggestions and i will post back what i have when
it crashes the next time.
I will post further info regarding my build when i get home from work.
Original comment by rapp.je...@gmail.com
on 17 Jan 2013 at 5:17
Great. One more question, do you have a huge amount of data in /data/media ?
What happens if you limit that to only a few files, does it still behave the
same?
What we need to do here is to narrow it down a bit so we can rule out some
components involved. Currently I am leaning towards some problem with massive
"attacks" towards the file system by PMS caused by some periodic indexing maybe?
Hans
Original comment by hans.bec...@gmail.com
on 17 Jan 2013 at 5:26
Original comment by hans.bec...@gmail.com
on 17 Jan 2013 at 5:29
To better mimic your environment I installed PMS on my QNAP NAS. It is a dual
core 1Ghz Intel Atom server, so not a very powerful machine in general. And, I
think I might see a behavior similar to what you describe. It is related to the
scanning of the media library by PMS, which I actually suspected already from
the beginning. However, even though rar2fs is consuming a whole lot of CPU, it
is only during the scanning process which in my case is *extremely* slow. An
update of the library with around 300 media files took close to 30 minutes!
During that time the CPU load is very high but once the scan is complete the
load drops back to normal again. It could appear as if rar2fs is locking up but
it did not in my case. The server just can not respond to any heavy tasks
during the PMS library update. Also, I am running rar2fs on top of another FUSE
based encryption filesystem and that too peaks at almost 100% during the same
time frame.
Can you confirm if rar2fs actually is stuck at 100% or if it is only during the
time of the PMS media library update? Check your Plex/Web timeline, it will
show the current progress/activity. I do not know what settings you made in
PMS, but if you have the periodic update enabled it could explain why you see
this only now and then.
Original comment by hans.bec...@gmail.com
on 18 Jan 2013 at 8:33
[deleted comment]
[deleted comment]
More tests conclude that running with a patched version of rar2fs that ignores
compressed archives completely rectifies the CPU load issue. However, indexing
of the media library only went down from 30 or so minutes to around 27 minutes
so nothing much was really gained on this system other than that the PMS server
is more responsive during the media library scan. I have not tested the deep
scan mode yet.
Locking rar2fs to a single CPU core using --no-smp does not have any effect
other than making things worse since load balancing performed by Linux is
disabled for this process.
Original comment by hans.bec...@gmail.com
on 18 Jan 2013 at 9:37
Can you try to replace your current rar2fs.c file with this patched one and
rebuild.
Using this CPU load is still high but if something else needs the CPU it will
be granted so. PMS indexing time was reduced somewhat and at the same time
server responsiveness was radically increased.
Original comment by hans.bec...@gmail.com
on 19 Jan 2013 at 1:15
Attachments:
My system
Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4
memory: 6G
Rar2fs
rar source 4.2.4
Rar2fs has been taking up 100% cpu for several hours now and the only program
that is unresponsive is plex. The rest of the computer is fine/responsive.
Nothing in dmesg that i can tell.
I do have a large media drive ~8TB but on my old system with older versions of
plex and rar2fs this was never a problem.
Last lines of Plex Media scanner.log
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - HTTP requesting to:
http://127.0.0.1:32400/:/metadata/notify/cullTimeline?librarySecti$
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - Scanning Anime with 1449 current
items in the database...
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - Performing a scan with 'Plex
Series Scanner' (language: en virtual: 0).
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - * Scanning
/data/rar2media/anime
so im guessing it has been stuck for 24hrs now.
in htop i get:
16335 plex 20 0 166M 67664 376 S 100. 1.1 26h55:18 ├─
/bin/rar2fs --seek-length=1 -o allow_other /data/media/ /data/rar2med
16336 plex 20 0 166M 67664 376 R 100. 1.1 26h55:16 │ ├─
/bin/rar2fs --seek-length=1 -o allow_other /data/media/ /data/rar2
8394 plex 20 0 0 0 0 Z 0.0 0.0 0:00.00 │ └─ rar2fs
I will try some of your recompiling advice now
I don't really know if there is a way to see what a process is doing right now
in linux as i am more of a windows user.
Original comment by rapp.je...@gmail.com
on 19 Jan 2013 at 9:47
Well definitely does not look good :(
The zombie process indicates that the thread could not terminate properly.
What exact version of rar2fs did you use before, and was it the same version of
Ubuntu?
Did your old system use the same version of FUSE?
Original comment by hans.bec...@gmail.com
on 19 Jan 2013 at 11:28
You can try to attach gdb to either PID 16335 or PID 16336.
Eg. gdb --pid 16336
Then you can dump the current stack using 'where'
It might give a hint to what is going on.
But PID 16335 is sleeping so it is probably just blocking on something.
PID 16336 is the most interesting one I would guess.
Original comment by hans.bec...@gmail.com
on 19 Jan 2013 at 11:34
Another thing you can check is under /proc filesystem.
Every process has its own entry there.
Eg.
cd /proc/16336
cat stack
This should provide you with a fairly recent snapshot of the call stack.
Original comment by hans.bec...@gmail.com
on 19 Jan 2013 at 1:13
If you get the chance, also try to reproduce the problem with the attached
patch.
Original comment by hans.bec...@gmail.com
on 20 Jan 2013 at 9:57
Attachments:
Sure i'll try. But i haven't had a lockup for days now. I'll report my findings
as soon as/if it crashes again.
Original comment by rapp.je...@gmail.com
on 21 Jan 2013 at 6:27
Just got a freeze, unpatched version.
I redacted the filename.
From gdb:
0x00007fa95ce3ed2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0 0x00007fa95ce3ed2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x0000000000409a12 in read (__nbytes=315160, __buf=0x7fa9560091f0,
__fd=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/unistd.h:45
#2 extract_to (file=<optimized out>, sz=315160, fp=0x7fa95461daf0, oper=1,
entry_p=<optimized out>)
at rar2fs.c:235
#3 0x0000000000409c3a in listrar_rar (path=0x7fa9545d7c30 "/utdelat",
buffer=0x7fa95bc84c18,
arch=0x7fa95bc84a40 "/data/media/utdelat/<redacted>.rar", hdl=<optimized out>, next=0x7fa9545ffc20,
entry_p=0x7fa95461a630, need_password=0) at rar2fs.c:1518
#4 0x000000000040aa47 in listrar (path=0x7fa9545d7c30 "/utdelat",
buffer=0x7fa95bc84c18,
arch=0x7fa95bc84a40 "/data/media/utdelat/<redacted>.rar", Password=0x0) at rar2fs.c:1755
#5 0x000000000040c197 in readdir_scan (dir=0x7fa9545d7c30 "/utdelat",
root=0x7fa95bc84bb0 "/data/media/utdelat",
next=0x7fa95bc84c18) at rar2fs.c:1979
#6 0x000000000040c56b in rar2_readdir (path=0x7fa9545d7c30 "/utdelat",
buffer=0x7fa95448cc30,
filler=0x7fa95d7b1de0, offset=<optimized out>, fi=<optimized out>) at rar2fs.c:2037
#7 0x00007fa95d7b5149 in fuse_fs_readdir () from /lib/libfuse.so.2
#8 0x00007fa95d7b538c in ?? () from /lib/libfuse.so.2
#9 0x00007fa95d7bc596 in ?? () from /lib/libfuse.so.2
#10 0x00007fa95d7bad65 in fuse_session_loop () from /lib/libfuse.so.2
#11 0x0000000000408683 in work_task (data=0x7fff1ffa0d70) at rar2fs.c:3379
#12 0x00007fa95ce37e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fa95cb64cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()
from /proc/pid/stack
nexx@lemur:/proc/23484$ sudo cat stack
[sudo] password for nexx:
[<ffffffff81090708>] hrtimer_nanosleep+0xb8/0x180
[<ffffffff81090846>] sys_nanosleep+0x76/0x90
[<ffffffff81665a02>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Original comment by rapp.je...@gmail.com
on 21 Jan 2013 at 7:57
Ok, seems there is a problem with the RAR inside RAR function :(
If the stack is taken from the looping thread/process then it seems read is
stuck trying to get data that never arrives on the pipe. Probably the unrar
process died without notification. Nothing that I have ever encountered before
I am afraid.
I would be good if you could provide me the details (in private mail if you so
wish) about the .rar file in question. If I could get access to it I might be
able to reproduce the problem. That would certainly make debugging easier ;)
Otherwise I will have to ask for your help to run different version of rar2fs
in an attempt to find the problem.
Original comment by hans.bec...@gmail.com
on 21 Jan 2013 at 9:34
right. it seems that if i try to list (navigate to dir and typing ls) that
particular .rar file, rar2fs will lock up. It isn't always taking up 100% cpu
but it wont respond anymore. I will send the file to your gmail address if
that's OK? I'm guessing i have other rars that lock up similarly as the
offending file isn't in a directory that plex watches.
Original comment by rapp.je...@gmail.com
on 21 Jan 2013 at 9:52
[deleted comment]
The problem has been identified to show up when there are non-flat .rar files
inside the main archive. The solution is either to remove support for this
specific type of archives (not expand the .rar inside main archive) or do some
coding to display it properly. Have not decided on what the final solution will
be yet.
In the mean-while, try using the --seek-depth=0 flag. It should resolve the
lock-up issue, but instead will completely disable the support for .rar files
inside main archive.
Original comment by hans.bec...@gmail.com
on 21 Jan 2013 at 11:30
Progress, But still a few miles to go. Actually, while trying to resolve this
problem numerous other issues popped up. In short, this will be a major update.
That also means official commit will be delayed due to the amount of regression
testing involved.
Original comment by hans.bec...@gmail.com
on 23 Jan 2013 at 11:14
Fix committed to trunk. Please test the latest version and verify that the
issue is resolved.
Original comment by hans.bec...@gmail.com
on 25 Jan 2013 at 8:39
Right. I have installed the latest ver from trunk and so far everything seems
good. I tried the file that crasched before and now it works perfectly so
hopefully the issue is sorted.
I'll post again if it crasches.
Thanks a lot for your work!
Original comment by rapp.je...@gmail.com
on 25 Jan 2013 at 9:12
Since I have no heard anything in a while I assume things are working as
expected.
Case closed.
Original comment by hans.bec...@gmail.com
on 3 Feb 2013 at 8:34
Original issue reported on code.google.com by
rapp.je...@gmail.com
on 17 Jan 2013 at 3:54