berzerk0 / Probable-Wordlists

Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!
Creative Commons Attribution Share Alike 4.0 International
8.71k stars 1.61k forks source link

Compress the files #10

Closed anatoli26 closed 7 years ago

anatoli26 commented 7 years ago

Please compress the files. .tar.gz, .tar.xz and .zip versions of single files or entire folders (+ #4) would be great! Top35Million-probable.txt uncompressed is 369Mb, compressed with xz it's just 85Mb. One could check their contents with zcat or zgrep -a without first uncompressing them.

berzerk0 commented 7 years ago

I hadn't thought of using multiple compression formats - why not?

anatoli26 commented 7 years ago

zip for windows users, .tar.gz for general-purpose utilities like zgrep, .tar.xz for highest compression. You could compress each file separately, then put them all together in a single torrent so users would be able to download only needed files, but from a single source. And you could name each new version of torrent with v1, v2, etc.

berzerk0 commented 7 years ago

Sounds like a good plan - I had planned to have different releases as things came in. I'm currently fixing up Rev 1.0 to release to Rev 1.1 by the end of the week.

Rev 2.0, which is more likely to be WAY more specialized is farther off.

berzerk0 commented 7 years ago

Oops, didn't mean to assign you

magnumripper commented 7 years ago

I can't see the point of tripling the size of the git repo by having three versions of each file? It's just more work for you.

Users on any OS can use any of the mentioned formats, for crying out loud!

berzerk0 commented 7 years ago

Until I've got the torrents set up, I thought it'd be simpler to host the smaller Wordlists in the repo. The largest Wordlists are stored offsite (and will continue to be).

In future releases, everything will be packed away nice and tight into torrents-files of manageable size.

Estimated time until Release 1.2 - with torrents - less than 2 weeks.

anatoli26 commented 7 years ago

@magnumripper, actually with the compression the total size even with the 3 formats together will be less than it's now. Anyway, the idea is to include everything in a single torrent (or 3 different torrents for each compression format) and for users to select the files to download before starting to download the torrent.

Also, all the additional work is just to type an additional command, like for gzip: find -type f ! -name "*.xz" ! -name "*.gz" ! -name "*.zip" -exec gzip -k {} \;, for xz: find -type f ! -name "*.xz" ! -name "*.gz" ! -name "*.zip" -exec xz -k {} \;, for zip: find -type f ! -name "*.xz" ! -name "*.gz" ! -name "*.zip" -exec zip {}.zip {} \;

@berzerk0, thanks for the progress update!

berzerk0 commented 7 years ago

Zipping into:

  1. Tar.xz (done)
  2. Tar.gz (done)
  3. 7z ultra compression (done)
  4. LZMA Zip (done by tomrrow)

Then it's onto the seedbox and towards 1.2

anatoli26 commented 7 years ago

great!

berzerk0 commented 7 years ago

All compressed formats available on Mega.nz (or at least uploading to it as of this posting)

All but the 4 of the largest are there now as of 14:22 EST 12 May 2017

Seedbox was up, then seedbox was down. Put in support ticket

berzerk0 commented 7 years ago

4 Styles of compressions live in torrents.