Closed GoogleCodeExporter closed 8 years ago
I guess I'm spoiled too as I've not had to do printf debugging in ages and
instead rely upon gdb with break points. But anyway basically I try ~3-4 games
of varying types currently the testing is on the following games.
Capcom Classics Remixed(arcade games emulated), Dungeon Siege Throne of
Agony(game that seems to do the most random reads), Untold Legends Brotherhood
of the Blade for right now as I think it should be enough. Mostly it's
repeatedly doing the same thing over and over with cso, iso, and zso along with
the random sleeps now since I know that was the issue at part of it.
The first game is the only one with a lot of random files I think, I may try
gta game because they also have a ton of random files as the other two are just
big wads with most of the files in it.
I'm hoping that that's enough, mostly doing map changes/things that cause the
thing to read stuff from the memory stick.
Also my cache is 23MiB with the default 256K the reason for the 23 instead of
20? I'm not doing online stuff and I'd like to hold as much as possible.
Original comment by 133794...@gmail.com
on 8 Jul 2014 at 10:07
OK it seems to have fixed those issues I can't seem to find any way to break
the thing and I finally found a game that has a ton of random files all over
the disk. It's Marvel Ultimate Alliance 1/2 they're both ~1.7GiB with only
~300MiBish in a solid file and the rest in random directories all over. I think
the reason why it's so big is because the game has something like 20ish
characters all able to talk/speak various lines allthrough the entire campaign
along with audio samples all over the battles.
It's seemingly doing AOK right now but my god I can't imagine playing it
without the isocache as the thing does a loading screen to open the menus and
even between them. I know the game was on the ps2 which had ~12MiB more ram
than the psp available to devs but even with csos the loading screens aren't
that bad. I'm going to try the csotest onit though but thusfar doing multiple
load screens and trying my darndest to try to trick the thing by randomly
forcing sleep during loading screens and the like and it's still holding up AOK
thusfar with iso,zso, and cso.
Original comment by 133794...@gmail.com
on 9 Jul 2014 at 3:39
Whats the point in having lz4 compression i don't see any major size diff and
whats will the benefit be?
Original comment by shatteredlites
on 11 Jul 2014 at 9:53
Speed. The lz4 algorithm combined with the rewrite of the read functions means
that the read speed of compressed isos will be almost the same as a
uncompressed one. There are benefits with the patch even if one doesn't use
lz4, In fact the size of a lz4 compressed file is bigger than a cso.
Original comment by codestat...@gmail.com
on 11 Jul 2014 at 10:45
For me lz4 isos are not only way faster than csos but sometimes they're as fast
as isos and even sometimes they can end up being very very very slightly
faster. That combined with the fact that it's not only better than gzip makes
it an amazing thing to have that makes life amazingly better.
Also as far as the size of the lz4 compressed ones if you're using lz4hc on
average it'll give you ~the same compression size of level 6 on gzip/cso files
whilst being dozens of times faster and at the very least 2-3x as fast to
decompress on the psp.
Also compressing with a zso makes a smaller file than the eboots from
popstation I wish that the popsloader spoke it but apparently that's using it's
own internal system.
Finally lz4/lz4hc is being used almost everywhere now. All frostbite games aka
almost all EA games are using lz4hc for all of the files, filesystems are using
it to make it all faster. Originally lzo was filling this requirement but lz4
is way faster, if you're running zfs(bsd) or btrfs you can use lz4 compression
and actually gain IO speed even for ssds.
So to just finally say it again, speed speed and more speed. csos used to take
forever, with the update they don't take as long, but with zsos you can have
compression(most of the time that's decent) whilst also not suffering form
vastly increasing your load times.
the exceptions are 1k tiny claws(game uses lz4hc already), and other minis
don't compress that well due to how small they are, and neither does marvel
ultimate alliance/the warriors both are games that seem to be originally ps2
games that were moved over to the psp. But generally I get ~3% worse than cso
-9 with zsos and way faster load times so for me this is amazing.
Finally as far as testing goes I haven't found any issues yet with loading any
of the games that I've tried and I've been trying to find some way to trick the
system into not running OK by tweaking the iso cache values and randomly
putting it to sleep and it seems to be holding up perfectly atm.
Original comment by Jimmydea...@gmail.com
on 11 Jul 2014 at 11:03
Hmm I wonder would this fix the frame drop in GTA games? Im not sure if this
was an issue that was in the game when it was made, or if its the loading times
of a compressed cso.
Original comment by shatteredlites
on 12 Jul 2014 at 10:05
>earlier in the thread if you managed to I don't know do a search.
>On my tests i managed to play GTA Liberty city stories from a CSO compressed
at level 9 (cpu at 333 and ms access speedup enabled) without virtually any lag
whatsoever.
>Also tested other 2 compressed games with similar results.
And that's with a CSO a zso has much much lower latency/lag times and thus
works a ton better. Finally in my tests on testing this I have one single weird
psp crash where it just shut itself down after constantly reading data(or
writing it) on the psp I'm currently trying to get it to happen again without
sleeping it as I did before.
But anyway testing is still going semi-ok besides that one single weird crash
but until i can force it to do it again I'll try to start up psplink.
Original comment by Jimmydea...@gmail.com
on 12 Jul 2014 at 10:14
OK I've been trying to find a way to get this thing to crash in a way that's
repeatedable but i'm unable to do so. The warriors seems to be crashing but
that could have just been a bad iso on my part but on games that ran I can't
get it to crash repeatedly with a known good iso/zso/cso so it seems to be AOK
right now.
Original comment by 133794...@gmail.com
on 2 Aug 2014 at 2:47
This is an interesting change. I was looking into the cso format and also
thought lz4 would be better, and found this.
I personally think it would be more interesting to mix deflate and lz4. This
would allow, for example, faster decompression of some sectors, and better
compression of others. The "plain" flag could easily be used for this purpose,
as long as the format guaranteed that a block of size 2048 would never be
compressed (why would it, anyway, except to waste cpu time?) This would be
fairly easy in the code.
The read reduction is definitely a good optimization. A small tweak I might
recommend is that if the data is compressed, and dst <= src - block_size, a
memcpy() isn't necessary - can just decompress directly to dst. This might
happen with a 32KB read where the compression ratio is decent, for example.
Though, this might make it harder to cache the block (are you doing that?)
I'd like to note that CISO_DEC_BUFFER_SIZE is way too big. It's 8KB, but no
matter what code path, it'd take a 128GB iso to use 2176 of it (with 2048 byte
blocks.) That being said, I think allowing for a larger block size would be
nice. Largely, this would only mean replacing some instances of
ISO_SECTOR_SIZE and 2048 with g_CISO_hdr.block_size (which could be changed to
a shift value if you're into micro-optimization.)
A larger block size (e.g. 4KB) would halve the index size (improving its
cache), and allow for better compression. For >= 4KB reads, this might result
in faster reads. It might slow down sector-by-sector reads, though. This
could be cured by using CISO_DEC_BUFFER_SIZE as a cache in decompress_block()
(which might also help multiple small < 2KB reads in the same sector?)
However this is finalized, I can add support to ppsspp as well.
-[Unknown]
Original comment by unknownb...@gmail.com
on 25 Oct 2014 at 6:24
@Uknown
When I first suggested that we do lz4 it was because the linux kernel was
proving that it did very well with 4k sectors and above I also believe that
it's been talked about trying to do 4k secotrs as it'd much better increase the
compression ratio for all of the files. I'm sure that this'd greatly help games
that have a ton of their space taken up with pre-compressed files as you'd get
more of them to get their fluff off of them. Now in terms of mixing deflate and
lz4, that'd not do much at all if you look at what I've said above.
lz4 hc is about the same compression ratio of deflate level 6 and on psp umd's
the difference between level 6 and level 9 on 2k is miniscule most of the time
from what I've seen it's ~3-5% difference at most which is next to nothing and
if you ask me would greatly increase the complexity of the decompression
algorithm. More likely to do better for it all is to add the ability to do 4k
sectors and then have the option to store a sector uncompressed if the file is
not compressible.
Original comment by 133794...@gmail.com
on 25 Oct 2014 at 7:07
As mentioned, it would be a small code change to support both compression
methods simultaneously. I say this as a developer looking at the patch. No
need for you to guess that it might be a complex change.
Supporting alternate block sizes is certainly a more complex change, but
probably fairly easy to verify by tweaking ciso.py to spit out an iso of
alternate size. It should be easy with the "ng" path, as well, more trouble
with the fallback.
-[Unknown]
Original comment by unknownb...@gmail.com
on 25 Oct 2014 at 7:22
I did all of the testing on the patch(es) previously and as I said I don't
think it'd help much at all. That's what I was talkinga bout, the relative
complexity isn't a huge thing beyond having to redo all of the build tests by
repeatedly tring to catch teh driver in an unknown state. But since this is
about to trying to get it make the cisos smaller, I don't see many reasons why
deflate/lz4 would honestly improve much of anything. On files that don't
compress well both deflate level 9, and lz4 hc don't compress them very well. I
can't honestly think of many that'd truly make it a better decision than simply
upping the block size.
Original comment by 133794...@gmail.com
on 25 Oct 2014 at 7:42
Well, this can easily be measured.
I've added experimental support for lz4 in my cso compressor I've been toying
with (mostly playing with libuv), and support for both cso and my proposed
format.
https://github.com/unknownbrackets/maxcso/releases
It can take existing cso, dax, or zso files as well, so no need to decompress
your inputs.
Using Crisis Core (ULUS10336) as an example, I've put some data at the end.
Conclusions are here, scroll down for the data.
At block size 2048, using lz4 definitely comes at a cost. 8 points worse
compression ratio, which is certainly more than 3%.
For a user who wants to maximize space, mixing lz4/deflate is a win. They get
a smaller AND faster loading file.
For block size 8192, the first impression is obviously "huge win." This varies
widely by input file, though, and larger block sizes may hurt decompression
speed in some scenarios.
With this larger block size, you lose only 4 points to lz4, which is clearly
better. Again, mixture allows for slightly better compression and lots of lz4
usage (so better decompression speeds than zlib alone.)
As far as complexity, here's the patch that enabled both lz4 and deflate
reading (as well as zso) in maxcso:
https://github.com/unknownbrackets/maxcso/commit/cc55a619fb852f91dd8424aef33594e
af79840c0
Very simple. I've also written up a more formalish spec of the format:
https://github.com/unknownbrackets/maxcso/blob/master/README_CSO.md
Personally I think supporting 2K, 4K, or 8K block sizes would be ideal. It has
clear benefits, at least in some games, despite the change cost. lz4+deflate
allows everyone to win: same or better compression ratios, faster loading
times, and people can still use 100% lz4 if they want. Plus it's easy to
support if adding lz4 and retaining cso v1 anyway.
-[Unknown]
Data:
Original ISO: 1716715520
Block size 2048 (standard):
ZSO (with lz4hc level 16): 1134591998 (66.09% ratio, 66.42% blocks lz4)
CSO (with default zlib level 9): 1006322266 (58.62% ratio, 71.06% blocks
deflate)
CSO (with 7zip + zlib level 9): 996709608 (58.06% ratio, 72.28% blocks deflate)
CSO v2 (7zip + zlib 9 + lz4hc 16): 996393914 (58.04% ratio, 4.65% lz4, 69.83%
deflate)
CSO v2 (zlib 9 + lz4hc 16 5% bias): 998737678 (58.18% ratio, 24.02% lz4, 49.94%
deflate)
Block size 8192:
ZSO (with lz4hc level 16): 563252647 (32.81% ratio, 100% blocks lz4)
CSO (with default zlib level 9): 499600821 (29.10% ratio, 100% blocks deflate)
CSO (with 7zip + zlib level 9): 493327409 (28.74% ratio, 100% blocks deflate)
CSO v2 (7zip + zlib 9 + lz4hc 16): 491593962 (28.64% ratio, 29.80% lz4, 70.20%
deflate)
CSO v2 (zlib 9 + lz4hc 16 5% bias): 497000736 (28.95% ratio, 54.98% lz4, 45.02%
deflate)
Block size 4096 (selected):
ZSO (with lz4hc level 16): 575563344 (33.53% ratio, 100% blocks lz4)
CSO (with default zlib level 9): 517909560 (30.17% ratio, 100% blocks deflate)
CSO v2 (7zip + zlib 9 + lz4hc 16): 506935875 (29.53% ratio, 31.25% lz4, 68.75%
deflate)
Block size 65536 (selected):
ZSO (with lz4hc level 16): 541828787 (31.56% ratio, 100% blocks lz4)
CSO (with default zlib level 9): 478272771 (27.86% ratio, 100% blocks deflate)
For your reproduction, the arguments in order are (append --block=X for block
sizes):
maxcso in.cso -o out.zso --format=zso
maxcso in.cso -o out.cso --fast
maxcso in.cso -o out.cso
maxcso in.cso -o out.cso --format=cso2
maxcso in.cso -o out.cso --format=cso2 --lz4-cost=5
Original comment by unknownb...@gmail.com
on 1 Nov 2014 at 11:01
Have you done some tests with games that load data continuously from the ISO
(like GoW: GoS and GTA games)? These patches were done with the idea to allow
the user to have a fairly compressed game while enjoying a lag-free experience
like a uncompressed ISO.
About storing uncompressed sectors, this is already supported on the ciso.py in
this project, one chan choose the compression threshold for when the compressed
block is discarded and just use the uncompressed block in its place.
I agree completly that the cso format needs for work with bigger sector size,
but the problem is that procfw assumes in a lot of places that the sector size
is 2048. A deduplication of code is needed so this can be maintained more
easily (right now, Inferno, Galaxy and vshctrl uses their own decompression
code for cso).
Original comment by codestat...@gmail.com
on 2 Nov 2014 at 12:09
I'm not talking about uncompressed sectors. I'm talking about lz4 vs deflate
sectors. My proposed "cso v2" format allows for storing both lz4 and deflate
in the same file (as well as uncompressed blocks, purely based on the size of
the compressed block... >= block_size means uncompressed, since it's silly to
compress it in that case anyway, just wastes cpu time.)
I'm mostly concerned about having a good format that can improve and make sense
for both cfw and emulators (like ppsspp) to support. The difference of 5% can
mean gigabytes in total, but lz4 can nevertheless improve load times on both
the PSP and on Android devices.
Anyway, the format itself does support larger block sizes, and ppsspp already
handles them since this pull: https://github.com/hrydgard/ppsspp/pull/7027
But yeah, I saw the trouble with duplication and hardcoded/misused vars, so I
can see how supporting larger block sizes is a pain. Still, I'd like to see a
mixed format (lz4 + zlib) rather than an lz4-only format, considering how easy
it would be to support. All it takes is setting lz4_compressed for the
0x80000000 flag, and reading raw data for when size >= g_CISO_hdr.block_size.
As for testing; I have not done much speed testing on a PSP device. I'm
assuming that lz4 is faster from your testing (I mean, I know it's faster on
desktop, obviously.) If the io read size is approximately the same, and lz4 is
faster, then a file of roughly the same size but composed of 30% lz4 blocks
(and 70% deflate) will naturally decompress faster than one of 100% deflate.
However, I can do some benchmarks on a PSP, maybe next weekend. Reading the
above a bit more, I think I misunderstood and thought the csotest was run on a
PSP not a PC.
-[Unknown]
Original comment by unknownb...@gmail.com
on 2 Nov 2014 at 2:05
Oh, but I actually have neither of the games you mentioned (except the demo of
GoW: GoS, but not sure how to get hard numbers out of running a game anyway.)
I'd have to do synthetic tests. I can just read the cso manually and generate
timings based on lz4 vs zlib and possibly the impact of block sizes. If you
have the access patterns, that would help (e.g. does it generally re-read the
same blocks but after reading other non-sequential blocks? or does it just
read different blocks all the time?)
-[Unknown]
Original comment by unknownb...@gmail.com
on 2 Nov 2014 at 2:17
Which levels did you do for lz4? There is _no_ lz4hc level 16, I don't know
where you're getting it from. The api has no such thing as those levels. It's
lz4 or lz4hc in terms of the api. So I don't know where you're getting it from.
Ciso also only does lz4 and lz4hc.
Finally, lz4hc was chosen because it takes way less cpu time in terms of
decompression which is a _huge_ thing for the psp as not everyone has millions
of cpu cycles to waste. Zlib the higher the level, the more cpu time is taken
during decompression. Whereas lz4hc is the same decompressor as lz4 so it's
constant speed in terms of decompression.
Also the whole point of lz4hc is compare it with zlib level 6. As I had said
previously, that's what lz4hc was comparing against. Teh zlib level 6
compressor and if you're comparing it with level 9 obviously it's going to lose
more. I said it's in general ~3% more compressibiliy in terms of level 6 vs
level 9 in most of the games.
And I know for a fact that the games lag like balls when you're trying to play
it with zlib level 9, hell even level 6 most of the time _increases_ my loading
times for the games that I've tested. It only makes it slower, so why would I
want to increase them?
LZ4 was made to decompress at maximum speed humanly possible.
The csotest was one to make sure that there was _no_ issues in a series of 500
reads. It's a basic test of the iso driver to test for memory leaks/obvious
problems. It still didn't catch everythign as my own testing showed.
And as far as 30% lz4 and 70% deflate, you're still going to end up harming the
loading speed. There's very few places that lz4hc will be that far behind zlib
in terms of the sectors that compress better with zlib. I was playing marvel
ultimate alliance which is a game with a crapton of loading, it's constantly
streaming stuff in.
Each of the characters has their own catch liens or whatever and they're always
being swapped in and out, and loading is pretty common place all throughout the
thing. It was also a game that didn't compress that well. zlib level 6 made my
avg load times from my memory stick by ~13s worse compared to raw iso. Whereas
lz4hc was about the same and in some cases a bit better. It was +/-2s from the
default in most of the cases as a vast mamority of the game's assets are
atrac3(or what I imagine they are) compressed sound files it's almost like
500MB or something of the entire iso.
So anyway yeah, the whole reason was to add compression support that made the
games load better whilst also providing more space. And with intermingling them
I can't imagine it'd be really worth it that much in terms of the format
itself. I can't see it doing much good as if you did tests with zlib level 6
intermingled with lz4hc(what is the good compression ratio for both) it'd not
end up with much of an improvement.
LZ4 is meant to be blazingly fast and is open source, lz4hc is meant to be
slower to compress but just as blazingly fast to decompress. My tests were on a
mips cpu of 1ghz with mddr ram which is slow as balls. I don't know how
slow/fast the psp's ram is but I do know that lz4hc is always faster loading
versus zlib level 6.
Original comment by 133794...@gmail.com
on 2 Nov 2014 at 2:27
also if you read it, it's 3-5% better compression of a cso when using zlib
level 9 vs level 6. As in it's ~5% better compression for zlib level 9 versus
zlib level 6, and lz4hc is similar to zlib level 6 in almost all cases and
loses a few perecent to it, but it greatly grealy makes up for it by not
wasting precious cpu time.
As for android devices lz4 will likely have a good result on it like the psp,
probably not as prnounced as most android phones have plenty of cpu and ram so
it's not going to be the difference between waiting forever and loading at an
OK speed.
I'll probably try to redo gow of war and gta games again and compare the
loading times but I know that for the games that I did test it was always
slower than lz4(hc) and most of the time made the loading times much worse.
Since you're bottlenecked by teh cpu trying to decompress the blocks instead of
being IO bound(as it normally it is with the uncompressed games) lz4 is always
going to have some effect on the games depending on how well it compresses but
it hasn't made anything that I've played lag worse than it did with plain iso.
Original comment by 133794...@gmail.com
on 2 Nov 2014 at 2:34
Yes, the csotest was run from a PC, debugging on psp is a pain and i needed to
use valgrind to proof test the read rewrite that i was doing.
I suppose that i can generate the read patterns of GTA and make a log with the
type of blocks that it tries to read. The only thing that i can remember from
the top of my head right now was that GTA reads on blocks of ~80KB and the lag
happened because the next read was not ready at the time it needed to continue
loading the city.
Anyway, the real bottleneck of the cso format was on the read method (2k block
reads at a time totally killed any performance gains) and the deflate/lz4
compression doesn't have that many impact.
With my limited time the most that i can do is to work on adding support for
the cso2 format and variable block size for the Inferno driver and vshctrl (so
the games can be displayed on the XMB).
BTW, i haven't done many tests but the psp memory on kernel mode is really
scarse. More than 64k of block size and there is a chance that the game won't
run at all if the user has enabled many heavy plugins. Not sure if this limit
should be enforced of the format or not since it only affects the psp (and the
pspemu on the vita).
Original comment by codestat...@gmail.com
on 2 Nov 2014 at 2:39
One last final post for today, as far as the tests go.
Here's how I did it.
I got the game, I did the same thing over and over ~10 times in a row timing it.
After I did the test, I did a cold boot again(to make sure there's nothing
lying around in the iso cache) and then did it over and over.
For some of the games, I simply loaded my save file, and went to some place
where I knew there'd be another loading screen. Others I went through the same
3-4 levels measuring the timing for them. That's the easiest way to get the
same results(hopefully) from your tests. YOu do the same thing over and over
and over again and time it. Since you're doing the same thing, the gamae's
likely to want to do the same things there'll probably be some varience on gta
since people move and such.
But for the most part, it's how I measured the loading time differences on my
psp. I kept them all the same settings, no plugins, everytihng at the same
level, read the same cso/zso over and over and over again and went from there.
And synthetic benchmarks are pretty much useless. As they say figures don't
lie but liars can figure, go look at amd/intel/nvidia they all use the same
synthetic benchmark and can game the system.
Look at the estimated MPG here in the US all of the car companies know what the
testing track is and the US only changes it up every 5-10 years. So each car
company can modify their engines so it'll come out better than it actually is.
It's the same thing with android, phone makers knew what the benchmarks were
and were able to make them score higher than they really were. Real world tests
are the only real way to figure out the performance characteristics is to make
a real test and do it in the real world.
I'm sure there's some way to make the system log out all of the lba reads and
you could record those and then use that for your synthetic test by reading
those certain blocks in the same order that the psp did by sticking that memory
stick into your computer and running the tests on it. I've been way too far
behind on trying to put debugging into the cfw's iso reader mainly due to
health issues/life but that could help to give you some information about how
the games do their reads.
Original comment by 133794...@gmail.com
on 2 Nov 2014 at 2:44
https://code.google.com/p/lz4/source/browse/trunk/lz4hc.c#611
https://code.google.com/p/lz4/source/browse/trunk/lz4hc.h#70
I'm using a value outside the "recommended values", and actually, I'm trying
multiple levels because sometimes a lower level actually saves a couple bytes
vs a higher level.
I realize that python-lz4 doesn't expose it, but that doesn't make it not exist.
You are *absolutely* wrong about zlib. Higher levels DO NOT take more
decompression time. In fact, due to reduced io time, they can take LESS time.
The only reason why a more heavily compressed cso file could read slower is
because of blocks that were uncompressed before and are compressed now.
As an example, look at Google's Zopfli. There's a reason why everyone is
trying to max out the compression level of deflate for the web, and it's
exactly because it DOESN'T increase the decompression time on the other end.
http://techcrunch.com/2013/02/28/google-launches-zopfli-compression/
You can see decompression times here as well (and surely on many other
benchmarks published in the last few decades):
http://tukaani.org/lzma/benchmarks.html
I realize this misinformation is well spread among cso users, but it's just
wrong. There's no evidence to back it up and plenty of evidence to the
contrary.
Nevertheless, lz4 decompresses faster than zlib (at least on most
architectures, and I thought on MIPS as well.) That being said, it's not clear
to me if you are conflating the results of the io optimization codestation made
(which is not directly related to using lz4, but was done with the patch) with
the results of lz4 decompression itself.
Irregardless, this doesn't mean the format should not support the use case of
people wanting to save space while at the same time supporting the use case of
people wanting maximum speed. Even if the file being 100% lz4 results in the
absolute best performance, that doesn't mean that should be the only thing the
file format supports. Why not let everyone have increased performance, and
people who care about disk space have some of both?
-[Unknown]
Original comment by unknownb...@gmail.com
on 2 Nov 2014 at 2:50
You're comparing web browsers vs a low-powered mips cpu. And I already know of
zopfli it does little more than running 7zip's zip compressor with 15
iterations when I was doing it's one on 100 iterations. Also with deflopt.exe
zopfli compresses worse than the previous program+advzip's compressor on level
4.
Finally about the "amazing speed of zlib level 9 being so fast."
Look at lz4, I've ran the benchmarks and from ram it's very similar for me.
LZ4 HC (r101) 2.720 25 2080
zlib 1.2.8 -6 3.099 21 300
lz4hc decompression is 2GB/s zlib? 300MB/s insanely _massive_ difference in
terms of decompression time. I can't imagine that you'd get a better result in
terms of decompression time when mixing the two. Considering that lz4hc is
almost on an order of magnitude faster than zlib. And you're also wrong about
the levels not being exposed, if you do the cli program for lz4 you have lz4/hc
and you can increase the block size that's the only "level" that it does beyond
the two different modes. That's it, it'll increase the block size and the only
other thing that also makes it compress better is making each block depend on
the previous block. Which is the default mode of zlib.
In your tests were you making sure that the lz4hc was using inter-block
dependency as is the default mode for zlib? Also you're throwing out the blocks
that weren't compressible to a certain percent it seems. I didn't do that with
my ratios, I did the full thing, it was compressed even if it ended up making
the files bigger. Also ff7 cc is basically just one huge archive file. I don't
know if it's using any compression or if it's just the raw data. I believe it's
just the raw data encrypted some-how.
And zopfli and efforts like it are nice to see, but most sites don't do gzip
compression above level 1 by default. Almost every company out there if they're
doing gzip compression have kept it on level 1 even for static assets.
Original comment by 133794...@gmail.com
on 2 Nov 2014 at 3:21
I forgot to say right here is the results of ciso with zlib -9 and lz4 hc. And
what do you know ~5% difference in terms of compression ratio. Hmmm.... I
wonder why that matches up _entirely_ with my internal results that it's ~3-5%
difference in terms of compression ratio in total for them.
$ ./ciso.py -m -a 0 -c 9 ff7_cc.iso ff7_cc.cso
ciso-python 2.0 by Virtuous Flame
Compress 'ff7_cc.iso' to 'ff7_cc.cso'
Compression type: gzip
Total File Size 1716713472 bytes
block size 2048 bytes
index align 1
compress level 9
multiprocessing True
ciso compress completed , total size = 1173993434 bytes , rate 68%
$ ciso ff7_cc.iso
ciso-python 2.0 by Virtuous Flame
Compress 'ff7_cc.iso' to 'ff7_cc.zso'
Compression type: LZ4
Total File Size 1716713472 bytes
block size 2048 bytes
index align 1
compress level 9
multiprocessing True
ciso compress completed , total size = 1267014960 bytes , rate 73%
what do you know it says _exactly_ what I said, so why are you being so insane
trying to act like I'm pulling numbers out of my ass for? I said it wasn't a
huge difference this is with the default setting for both. no minimum
compression threshold and it's _identical_ to what I had said above. So I still
stand behind my comments and it seems my memory wasn't failing me at all.
IF you're doing something _that's not default _at all_ as you were doing sure
you may come up with teh figures that you've said but as I've said a difference
of 5% seems to be the maximum of the scale. So one final time, my figures were
correct and your hand-crafted benchmark using non-default situations is no
where near what mine was.
And finally/for the record it seems like UMDgen's ciso compressor shaves off at
most ~1-2% in the best cases compared with ciso.
Original comment by 133794...@gmail.com
on 2 Nov 2014 at 3:33
codestation: indeed, debugging on the psp is a pain. Sorry, didn't see your
post before, guess I started typing right before you posted.
Well, 80kb blocks sound like they would be most influenced by the io read speed
and the number of io operations. So that is most likely influenced by the
optimization you made. It could even be a simple timing/scheduling thing (i.e.
the actual io may not be taking longer in sum, but it may be scheduling to
other threads... not sure if the sceIoRead() schedules based on the priority of
the calling thread, or the kernel?)
Yeah, I realize memory is scarce. As mentioned, CISO_DEC_BUFFER_SIZE is
already too large. I don't think block sizes larger than 8KB make any sense,
the point of those numbers was to show that the gains got small.
133794: not really sure why you're angry that I get better compression ratios
than you. I'm sorry but I don't really have the time to respond to all the
things you've said, but "everyone uses level 1" and etc... no offense intended,
but I think you need to drink a cold glass of water and actually research these
things before you say them.
Also, I'm not trying to attack you somehow by providing data about compression
ratios and mixtures of formats. I'm sorry if you've somehow gotten that
impression.
As far as "non-default", that's the whole point of programming. If everyone
just said "default is good", Yann would've never created lz4. Not sure why
that makes you angry either. I've provided full source code for everything,
not some sort of "hand crafted" benchmark.
-[Unknown]
Original comment by unknownb...@gmail.com
on 2 Nov 2014 at 4:07
Oops, I had a bug in the cso reading code introduced when messing with the new
stuff, it's fixed now. The ratios above aren't right, but they are right
relative to each other for 2048, which was the entire point anyway.
Block size 8192 was however majorly affected (the bug caused the file to appear
more compressible, especially to lz4):
deflate only: 1113812184 (64.88%)
lz4 only: 1263270481 (73.59%)
deflate+lz4: 1113560557 (64.87%)
deflate+lz4+5% lz4 bias: (65.00%)
The 5% bias means that if deflate would be 80% the size, then it will use lz4
as long as it's <= 85% the size. If deflate were 10%, it would only use lz4 if
<= 15%.
Anyway, the trend from 2048 only shows stronger - lz4-only loses just short of
9 points (9 GB on 100 GB of uncompressed isos.) A combination even with some
bias loses very little (0.13 points, so 130 MB on 100 GB of isos.)
I should've known something was funny with that strong decrease, oops.
-[Unknown]
Original comment by unknownb...@gmail.com
on 2 Nov 2014 at 6:12
Based on some basic tests, performance of lz4 vs deflate seems mostly as
expected.
LZ4 version: https://github.com/Cyan4973/lz4/commit/c0054caa
Deflate version: 6.60 sceKernelDeflateDecompress
I sampled a small assortment of blocks, which of course are not exactly the
same size but were reasonably close.
Timings are reasonably stable and lz4 takes about 20% as long to decompress at
a block size of 2048. At deflate's best, lz4 was 35% the time.
That being said, we're talking about ~60us vs ~300us per block. I pulled out
my "Mark2" Sony brand Memstick, and I get ~2800us per 8KB, ~2100us for 2KB, per
read. I get faster (~1000us for 8kb, ~700us for 2kb) even over usbhostfs. My
class 10 performs only a little better at ~2200us and ~1600us respectively.
So, for a 2KB read, lz4 can improve at most 15%.
In comparison, reading from the umd is of course slow. About ~3000us per
random 2kb read after spin up haven't tested 8kb.) Cached reads are hard to
measure due to the dcache, but they appear to be in the ~300us range (which is
notably faster than my ms even for repeated reads.)
I'm not sure if there's a cache cso reads hit before they hit the cso file, I
assume so.
So, as expected, reducing the read count is sure to have helped much more
significantly than the compression format, which is great.
That being said, lz4 is faster so it's not a bad thing. For larger reads, it
can gain more performance. Not sure what the codesize cost is (memory.)
-[Unknown]
Original comment by unknownb...@gmail.com
on 9 Nov 2014 at 8:48
https://code.google.com/p/procfw/source/detail?r=4bbef137299d2927ce96f7900a2b001
e2ccabdff
Original comment by devnonam...@gmail.com
on 16 Dec 2014 at 10:25
Original issue reported on code.google.com by
hastur...@gmail.com
on 22 Aug 2011 at 10:45