dnbert / prm

PRM Allows you to quickly build package repositories, inspired by Jordan Sissels' FPM
MIT License
250 stars 33 forks source link

too-large rpm hangs when processed #52

Open trel opened 8 years ago

trel commented 8 years ago

I have been working through automating our build process, and our existing development package is the only one that does not succeed when processed by prm. Investigating further, and generating test rpms to exclude something specific to our package, I have found where some size threshold is being surpassed.

I am using EPM to generate RPM files.

Works Hangs
EPM's test.list filesize 4,742,089 4,744,637
RPM's test.spec filesize 5,130,897 5,133,653
RPM filesize 13,062,868 13,070,664

beginning and end of test.list

%product EPM testing
%copyright BSD
%vendor Testing <testing@example.org>
%license foo
%readme foo
%description Example description
%version 1
%format all

d 755 root root /usr/lib/epmtesting -
f 644 root root /usr/lib/epmtesting/foo1 foo
f 644 root root /usr/lib/epmtesting/foo2 foo
f 644 root root /usr/lib/epmtesting/foo3 foo
...
f 644 root root /usr/lib/epmtesting/foo96998 foo
f 644 root root /usr/lib/epmtesting/foo96999 foo
f 644 root root /usr/lib/epmtesting/foo97000 foo

instrumented prm and arr-pm output for hanging case

Built Yum repository for centos6
x86_64
1 write begin 16384
  write complete
  read begin
  reading...
  ---- SKIPPING
  read complete
2 write begin 16384
  write complete
  read begin
  reading...
  ---- SKIPPING
  read complete
3 write begin 16384
  write complete
  read begin
  reading...
  ---- SKIPPING
  read complete
4 write begin 16384
  write complete
  read begin
  reading...
  ---- SKIPPING
  read complete
5 write begin 9412

htop output for hanging case

11902 trel        20   0 1304M 1203M  3732 S  0.0  0.5  0:00.00 │  │  │  │  │     ├─ /usr/bin/ruby1.9.1 /usr/local/bin/prm -t rpm -p pool -r centos6 -a x86_64
11901 trel        20   0  4440   640   540 S  0.0  0.0  0:00.02 │  │  │  │  │     └─ sh -c xz -d | cpio -it --quiet
11904 trel        20   0  7304   624   524 S  0.0  0.0  0:00.00 │  │  │  │  │        ├─ cpio -it --quiet
11903 trel        20   0 12012  1084   752 S  0.0  0.0  0:00.00 │  │  │  │  │        └─ xz -d

strace output for hanging case

$ sudo strace -s 300 -p 11904
Process 11904 attached
write(1, "sr/lib/epmtesting/foo70470\n./usr/lib/epmtesting/foo70471\n./usr/lib/epmtesting/foo70472\n./usr/lib/epmtesting/foo70473\n./usr/lib/epmtesting/foo70474\n./usr/lib/epmtesting/foo70475\n./usr/lib/epmtesting/foo70476\n./usr/lib/epmtesting/foo70477\n./usr/lib/epmtesting/foo70478\n./usr/lib/epmtesting/foo70479\n./u"..., 4096

prm hangs here: https://github.com/dnbert/prm/blob/master/lib/prm/rpm.rb#L242

The strace write() call is from the ruby-arr-pm library here: https://github.com/jordansissel/ruby-arr-pm/blob/master/lib/arr-pm/file.rb#L210

It feels like a logic error once the filelist gets too long - but I've hit the edge of what I can diagnose by inspection. And of course, if this is a bug in arr-pm proper, apologies here.

trel commented 8 years ago

@jordansissel Do you have any insight into whether prm is calling arr-pm incorrectly, or this is an internal arr-pm bug?

dnbert commented 8 years ago

Hi @trel sorry for my reply being so late!

I think this is a bug in arr-pm, but it might be due to this payload assignment here: https://github.com/jordansissel/ruby-arr-pm/blob/master/lib/arr-pm/file.rb#L208

How large is the list in the rpm? Any chance you could provide me with a sample rpm (something at the least similar to your RPM)?

trel commented 8 years ago

The list is 97k lines long:

$ rpm -qpl linux-2.6-x86_64/epmtest-1-linux-2.6-x86_64.rpm | wc -l
97001

first of the file list

$ rpm -qpl linux-2.6-x86_64/epmtest-1-linux-2.6-x86_64.rpm | head
/usr/lib/epmtesting
/usr/lib/epmtesting/foo1
/usr/lib/epmtesting/foo10
/usr/lib/epmtesting/foo100
/usr/lib/epmtesting/foo1000
/usr/lib/epmtesting/foo10000
/usr/lib/epmtesting/foo10001
/usr/lib/epmtesting/foo10002
/usr/lib/epmtesting/foo10003
/usr/lib/epmtesting/foo10004

last of the file list

$ rpm -qpl linux-2.6-x86_64/epmtest-1-linux-2.6-x86_64.rpm | tail
/usr/lib/epmtesting/foo9990
/usr/lib/epmtesting/foo9991
/usr/lib/epmtesting/foo9992
/usr/lib/epmtesting/foo9993
/usr/lib/epmtesting/foo9994
/usr/lib/epmtesting/foo9995
/usr/lib/epmtesting/foo9996
/usr/lib/epmtesting/foo9997
/usr/lib/epmtesting/foo9998
/usr/lib/epmtesting/foo9999

I've been generating an EPM list file with this script...

$ cat generate_test_epm.sh 
#!/bin/bash -e

NUMBEROFLINES=97000
# increase this until prm hangs
# NUMBEROFLINES=98000

# local file
FILENAME=foo
dd if=/dev/zero of=$FILENAME bs=1k count=10
echo "foo" > $FILENAME

# prepare preamble
echo -e "
%product EPM testing
%copyright BSD
%vendor Testing <testing@example.org>
%license foo
%readme foo
%description Example description
%version 1
%format all
"

# list of files
echo "d 755 root root /usr/lib/epmtesting -"
for (( i=1; i<=$NUMBEROFLINES; i++ )); do
    echo "f 644 root root /usr/lib/epmtesting/$FILENAME$i $FILENAME"
done

Then generating an RPM via EPM

$ ./generate_test_epm.sh > test.list
$ epm -k -f rpm epmtest test.list

The RPM will then be local

$ ls -l linux*/epmtest*
-rw-r--r-- 1 trel trel 13062880 Sep 29 14:33 epmtest-1-linux-2.6-x86_64.rpm
-rw-r--r-- 1 trel trel  5130897 Sep 29 14:32 epmtest.spec

A sample file can be found here for 30 days...

This will hang when processed by prm (with the new -d option, but should be irrelevant here)

$ prm -t rpm -p pool -r centos6 -a x86_64 -d .

I've instrumented arr-pm with the following

$ git diff -w
diff --git a/arr-pm.gemspec b/arr-pm.gemspec
index 4645b97..d515237 100644
--- a/lib/arr-pm/file.rb
+++ b/lib/arr-pm/file.rb
@@ -204,17 +204,26 @@ class RPM::File
     end
     payload_fd = payload.clone
     output = ""
+    count = 0
     loop do
+      count += 1
       data = payload_fd.read(16384, buffer)
       break if data.nil? # listerextractor.write(data)
+      puts "#{count} write begin #{data.length}"
       lister.write(data)
+      puts "  write complete"

       # Read output from the pipe.
+      puts "  read begin"
       begin
+        puts "  reading..."
         output << lister.read_nonblock(16384)
       rescue Errno::EAGAIN
         # Nothing to read, move on!
+        puts "  ---- SKIPPING"
       end
+      puts "  read complete"
     end
     lister.close_write
dnbert commented 8 years ago

Hey @trel I couldn't get an EPM build to work. I'll have an strace of the EPM error, but it essentially said it couldn't build the package. I'll try with an FPM build later!

trel commented 8 years ago

thanks, a large enough fpm-produced package should hit this as well.

in the meantime, we've got a local temporary workaround with rpm2cpio.

dnbert commented 8 years ago

@trel definitely replicated with an fpm produced package. Hoping to dig in more tonight

trel commented 8 years ago

excellent news.

dnbert commented 8 years ago

I've looked a bit more and it is definitely an issue on that arr-pm side, or at least that's where it's hitting. I tested a few different things, but it looks like lowering the string length size, I was able to build out a repo with a RPM that had 95000+ files in it. My guess is that the pipe for the IO object is not being flushed, or is too small, for the type of content we're throwing at it - but again that's a guess.

I tried various sizes for the string lengths, 8000, 1000, 500, 100, and 10

I can't imagine @jordansissel will want to reduce the read string length from 16k to 1, it takes quite a bit of time to generate the repository for a single package due to that. I was able to build the repository after changing the string length to 1 and adding in some stdout content (puts "test") into my arr-pm library.

trel commented 8 years ago

I was also able to get the original code to work with extra puts when trying to work out what was happening. Comment out the debugging, and it would hang again. It does feel like a flush-related thing.

Lowering the buffer size did not seem to help for me, even when set to 1.

dnbert commented 8 years ago

@trel so while it does look like the pipe size is the issue on the read, there's not much more for me to go on unfortunately. I've kinda tapped my expertise, but yea it's definitely an arr-pm issue

trel commented 8 years ago

I'll file an issue and point back here. Thanks @dnbert

quanah commented 8 years ago

Hm, I was interested in possibly using this for Zimbra, but this would be a total blocker. :(