box-project / box2

An application for building and managing Phars.
https://box-project.github.io/box2/
MIT License
1.19k stars 80 forks source link

Long build times #97

Open basz opened 9 years ago

basz commented 9 years ago

Hi,

I haven't use box much but did try to package some zendframework applications with it. I noticed (via running it verbose) that as more files are included took a little bit longer per file to do so. And in the end it was at crawling speed. Took about 1.5 hour to a medium sized project. (a few thousand files)

I noticed that the resulting phar file that is being written to kept changing file size while the build took place. Is this a known issue with a work around?

Could it be related to https://scrutinizer-ci.com/blog/composer-gc-performance-who-is-affected-too ?

Bas

Bilge commented 9 years ago

Observed exactly the same behaviour. This is too slow to be usable in practice. This is despite disabling compression and compacting. Running with php -d zend.enable_gc=0 did not help markedly if at all.

padraic commented 9 years ago

Only ever used it on small items, like CLI apps. Was file size growing constantly, or reducing and then growing back up to slightly bigger each time?

basz commented 9 years ago

Hi @padraic, the latter. File size goes from zero to bigger constantly, it looks like the phar is being rewritten every time a file is added. Box is unusable for me on these kind of projects, so for now rolling my own phar building, which still takes 6-8 minutes to complete.

kherge commented 9 years ago

When you run box build, the entire archive is rebuilt from scratch.

Unfortunately, incremental building isn't supported in this version of box, but it is planned for the next version (#84).

Bilge commented 9 years ago

Incremental building does not sound like it tackles the core problem here even though it may improve performance in a different way.

basz commented 9 years ago

I am not doing an incremental build here... Shall I create a short movie demonstrating the issue?

Op 4 mrt. 2015 om 17:17 heeft Kevin Herrera notifications@github.com het volgende geschreven:

When you run box build, the entire archive is rebuilt from scratch.

Unfortunately, incremental building isn't supported in this version of box, but it is planned for the next version (#84).

— Reply to this email directly or view it on GitHub.

kherge commented 9 years ago

Sure, but I think I might understand the issue. As you build the *.phar file, you notice the file size fluctuating as the build is running, is that correct? I think zip does the same thing. I'm guessing this is just how the phar extension is choosing to build the phar file, which I have no control over. Box itself only removes and recreates the phar once in the very beginning of the process.

basz commented 9 years ago

Yes correct, that's the issue... However I don't think zip does behave in a similar fashion - why would it - as it is perfectly possible to just append data to files. It -feels- like an incompatible configuration option somewhere, because I can't be the first one to try this including large libraries such as zf2 or symfony can I? Anyway, i was shopping someone would recognize a quick solution - not really a problem for me as at this time I was only doing unpaid experimentations. thanks

arjanvdbos commented 9 years ago

Same problem here, mid size ZF2 + Doctrine project takes about 1,5 hours to build. I really love the simplicity of box, but this makes it useless for me.

It seems that Phing's PharTask had the same problem (https://www.phing.info/trac/ticket/782), they solved it by using Phar::buildFromIterator() which should massively speed up phar generation.

Bilge commented 9 years ago

@arjanvdbos That seems like some pertinent information. I hope @kherge is taking note.

kherge commented 9 years ago

The bug ticket mentions that they switched from a one-by-one approach to buildFromIterator(). The one-by-one approach is required by Box in order to process each individual file. While we may likely see a boost in performance, we won't be able to make any changes to those files.

I can probably make this work, but you may only see the performance improvement when none of the file processors available to Box are used.

basz commented 9 years ago

I'm not familiar with the steps box takes, but it might work and still be a lot faster if you copy and process files first to a tmp dir and then import them with a buildFromIterator.

Op 6 apr. 2015 om 17:12 heeft Kevin Herrera notifications@github.com het volgende geschreven:

The bug ticket mentions that they switched from a one-by-one approach to buildFromIterator(). The one-by-one approach is required by Box in order to process each individual file. While we may likely see a boost in performance, we won't be able to make any changes to those files.

I can probably make this work, but you may only see the performance improvement when none of the file processors available to Box are used.

— Reply to this email directly or view it on GitHub.

kherge commented 9 years ago

Hah, I like your idea much better!

The current process is something like this:

for each file listed in box.json
    read its contents into memory
    process the contents
    add it to the phar using addFromString()
    move on to next file

The new one will look like this:

for each file listed in box.json
    copy it to a temporary directory
    process the contents
    move on to next file

add the temporary directory using buildFromIterator()
arjanvdbos commented 9 years ago

Good to see this issue is reopened!

I didn't look at the internal working of Box, but in my opinion it is not required to copy and process the files first, and then generate the Phar. You could also implement a sort of collection class which implements a Iterator with a callback. In the callback you can handle all the pocessing stuf.

webdevvie commented 8 years ago

Hi,

I wanted to reduce my build times from 325 seconds a bit :) So I built two branches for both box2 and box2-lib

https://github.com/webdevvie/box2/tree/faster-adding-with-excluded-regexp https://github.com/webdevvie/box2-lib/tree/faster-adding-with-excluded-regexp

It adds a config value to the box.json file for box2: "exclude-from-value-replace":[ "/^vendor(.*)/i", "/^src(.*)/i" ]

and uses an ArrayItterator to add the files from a file queue to the phar file.

This reduced the build time for my project to 10 seconds. Which is more than acceptable for me.

I hope this is useful to you. I ran the unit tests for box2, and they are green. But the box2-lib unittests throw a warning about trying to read a fopen(/does/not/exist) but the tests are still green.

Hope this helps out.

My branches: https://github.com/webdevvie/box2/tree/faster-adding-with-excluded-regexp https://github.com/webdevvie/box2-lib/tree/faster-adding-with-excluded-regexp

kalehrishi commented 6 years ago

Anyone working to get this feature to main build? @webdevvie How can I use your version?

webdevvie commented 6 years ago

@kalehrishi I added the specific fork and branches to my composer.json

i added this to my require array:

        "kherge/box": "dev-faster-adding-with-excluded-regexp as 2.7.2",
        "herrera-io/box": "dev-faster-adding-with-excluded-regexp as 1.6",

then added this:

"repositories": [
        {
          "type": "vcs",
          "url": "git@github.com:webdevvie/box2.git"
        },
        {
          "type": "vcs",
          "url": "git@github.com:webdevvie/box2-lib.git"
        }
    ],

until the branches are merged or some other solution is added.

There are several open pull requests for both projects for over a year now so I think this project is in the phantom zone or something.

kalehrishi commented 6 years ago

@webdevvie Thanks a lot.