HelloZeroNet / ZeroNet

ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network
https://zeronet.io
Other
18.35k stars 2.27k forks source link

Big file support #7

Closed HelloZeroNet closed 7 years ago

HelloZeroNet commented 9 years ago

Torrent like file splitting and distributing

HelloZeroNet commented 9 years ago

Maybe needs to wait until DHT support to able to find peers to needed fileparts

up4 commented 8 years ago

Hi, me again. Two questions:

  1. Why, precisely (in terms of actual code), are big/large files not supported (many references to "more than 1MB")?
  2. Why does it "need to wait until DHT support" lands ?

Thanks!

HelloZeroNet commented 8 years ago

Some blocker for big files:

As optional files support added around 6 months ago, so DHT is no longer required for this. ZeroNet is created for dynamic websites, most of the site types does not requires big file support, so it's not a priority yet. Maybe adding a torrent client as a plugin is a better solution.

up4 commented 8 years ago

Hi!

No harsh feeling against you personally, but I hold a solid grudge against the "please spare Tor for important text-based traffic from the third world". Very pre-Nietzschean as a moral argument. I just can't. Plus, I have discussed it ad-nauseam in ZeroChat FR (I think). If I were to make a fork of this repo, modify the storage mechanism (and download progress interface) and not touch the protocol itself, I think people would prefer this version over the official ZeroNet stack because it just makes more sense technically. I'd rather do that than an optional plugin. I'm going to make a pull request when I'm done and I'm going to let you decide if you want it in your repo or not.

I just hope we can make this work.

HelloZeroNet commented 8 years ago

Plugin would be nice, i'm planning to do a plugin management interface later, that makes easy to add/remove/install the non-default features

PepinLeBref commented 8 years ago

Isn't possible to use the Tribler's onion routing for the big files ?

HelloZeroNet commented 8 years ago

probably it's possible if we use torrent for big files

zeronetscript commented 8 years ago

Hello, I've created a p2p stream helper here https://github.com/zeronetscript/universal_p2p, with this helper runs on visitor's PC, any ZeroNet web page can use a simple way to stream resource (on the fly)from bittorrent ( a demo site already tested by some visitors)

for example use normal HTML5 video tag , points src to

http://127.0.0.1:7788/bittorrent/v0/stream/c12fe1c06bba254a9dc9f519b335aa7c1367a88a/video.mp4

this http request makes helper's bittorrent backend download torrent by infohash and streaming video.mp4 file to client. with this helper, any ZeroNet website can stream from any existing torrent's resource. I'd plan to support more p2p backend (for example http://127.0.0.1:7788/ipfs/xxx) ,more convient function (direct stream file inside zip archive).

I'd also suggest intergrate tribler to zeronet , and works as my helper way. this makes any existing bittorrent resource accessable without special pages. Protocol prefix in url reserve the ability to support different backend, keeps protocol version to makes protocol upgrade easy.

up4 commented 8 years ago

Solution proposed and being worked on here (I will commit real code in the next couple of days): pull request #521.

alugarius commented 8 years ago

@zeronetscript I really support the IPFS idea, instead of working against we should include IPFS in the future, the network will become amazing. ipfs is beeing ported to javascript and python .... so it should be possible.... well, the most do it already, sharing files over ipfs in 0chan for example.....

Suggestion at first: Including ipfs in the bundle, started with zeronet together, more options to site creators. We have to live this dream! ^^

HelloZeroNet commented 8 years ago

BitTorrent support is planned in the next 6 months, which will provide solution for big files

Bachstelze commented 7 years ago

If you can resign secure Tor usage you can use http://www.cachep2p.com/ in javascript

antilibrary commented 7 years ago

Having a native implementation of IPFS makes complete sense for me as well. We would need to allow support for files packs, for example, instead of having each individual file available for redistribution we could have a 'folders' like approach: all images, video course XYZ, all video courses of category ABC, books from id 1 to 100, all books in my shelves, music album DEF, all classical music albums, etc. The packs could be defined by the site owner with the list of IPFS hashs (or a query that leads to this list), once a pack is selected to be seeded by the user, it will be downloaded and seeded through the local ipfs daemon. The current implementation UI is hard to use because nobody will select individual files from the lists. The user is not interested in seeding specific files, he is interested in helping the site by providing a bit of bandwidth and space. For a site like 0chan, the site owner could pack the different categories and users can then seed the whole category (eg: seed all files from /dev/). For a site like ZeroWiki, the same thing applies, the site owner could split the seeding in sections like: seed all images, seed all pages, seed history of all pages, OR he could use different wiki categories like: seed all content of category ABC. One of the benefits of having it in IPFS is that the files are also available to the opennet via the IPFS gateway. So in the case of sites like antilibrary.bit, once users start to seed the book packs, the book files will be available for everyone to download via Zeronet gateways + ipfs gateway (eg: https://bit.no.com:43110/Antilibrary.bit/ ).

MuxZeroNet commented 7 years ago

I believe ZeroNet protocol does a better job protecting integrity and privacy than the original BitTorrent protocol. ZeroNet has the potential to be a more resilient file sharing network.

ZeroNet BitTorrent
Digest SHA-512/256 (the truncated version) SHA-1 (vulnerable to BitErrant attack)
Signature secp256k1 (Bitcoin) ?
Encryption TLS, on by default, unless OpenSSL is gone Various, on by deault in many clients
Link Bitcoin address, ripemd160(sha256()) Magnet URI, SHA-1
File List Signed content.json BEncoded, not signed

Current blocks:

HelloZeroNet commented 7 years ago

BitTorrent clients also have the advantage of ~15years spent on optimization and the fact that they don't have to worry about the dynamic content. (eg. they can identify pieces by a simple id on zeronet you have to identify by filename or hash)

antilibrary commented 7 years ago

We could benefit from this: https://ipfs.io/blog/23-js-ipfs-0-23/

haschimoto commented 7 years ago

isnt the problem with bittorrent protocol/client for big-files losing any protection from Tor?

funny110 commented 7 years ago

When big file surport will come out ?

MuxZeroNet commented 7 years ago

When big file support will come out ?

Hello there!

Big file feature requires more research and more frequent discussion. Instead of asking this, do you think you can contribute any ideas regarding any of the bullet points below? Write another comment and we (the community) will evaluate your thoughts.

leycec commented 7 years ago

@MuxZeroNet Thanks for the thorough synopsis of open questions! Several are currently under discussion at the ZeroTalk thread for this topic.

Since I'd prefer this clearnet issue serve as the central hub for this topic, I've taken the liberty of copying across a few of the more notable comments on that ZeroTalk thread. If this was unspeakably bad, just let me know and I'll remove the offending quotes.

Let's do this.

  • Hashing, integrity How can we hash the big files in such a way that small segments can be verified efficiently?

The canonical solution is Merkle trees (i.e., hash trees). I'll be astonished if ZeroNet doesn't eventually adopt some variant on a hash tree for distributing and validating big files. The devil is in the details, however:

nofish ━ on Jun 09, 2017 It's planned this summer, right now I experimenting with merkle trees. Conclusions so far: Pros: Smaller content.json files (only one 64bytes roothash) Cons: Needs to send proof with every pieces. (640 bytes/piece at 1000 pieces) and slower confirmation time So I'm not sure if it's worth it...

p2p ━ on Jun 13, 2017 IMO we can have quick-hash-tree + SHA-512-whole-file-hash as default, and all-SHA-512-hash-tree as fallback. In content.json we just need to store these 3 kinds of hashes: quick-hash-tree's root hash, SHA-512-whole-file-hash, and all-SHA-512-hash-tree's root hash. First, we calculate quick-hash-tree in which much faster checksums are used such as CRCs to calculate leaf node hashes, and then combine the whole file, calculate the SHA-512-whole-file-hash. In case of attack ( all CRCs are right but SHA-512-whole-file-hash is invalid ), we fallback to all-SHA-512-hash-tree to find out which data piece is fake. In this way, we can address the performance problem.

Integrating IPFS into ZeroNet is strictly off-the-table for all of the obvious reasons, including:

nofish ━ on Jun 10, 2017 It [integrating IPFS] could be possible, but then we need to run separate daemon which would reduce portability, more memory usage, probably no full tor support, more connections and probably other problems.

skwerlman ━ on Jun 11, 2017 I hadn't thought about tor support, which is probably a deal breaker since IPFS is UDP-only atm. That CoC [Code of Conduct] is pretty spooky, since it seems to apply US law globally, and, assuming it's enforceable, it means the IPFS devs are susceptible to state coercion. I think you're right that IPFS isn't the right solution here.

The remainder of @MuxZeroNet's line of questioning ultimately reduces to user experience (UX). The ideal approach would be to incrementally generalize ZeroNet's existing small optional file support to gradually encapsulate all of the functionality required for big optional file support.

To do so sanely, a browser-based user interface for managing optional files is all but essential. Specifically, ZeroHello might introduce a new zite-specific context menu item (e.g., named "Files," "Share," "Details," "Content," or something similar). When clicked, this item might open a new browser tab:

In other words, I2PSnark in ZeroNet drag ala:

I2PSnark UI

Sadly, the size of even small optional files currently contributes to the 10MB zite limit. Generalizing ZeroNet's existing support from small to big optional files thus requires decoupling the size of optional files from the maximum size of the zite hosting those files.

The ideal approach is probably the incremental approach – one slow, languorous pull request at a time until we're where we eventually want to be. This road is long and winding, but utopia remains in sight.

HelloZeroNet commented 7 years ago

Plan 0.5

File hashing: .piecemap.json

To avoid large content.json files move the piece hashes to separate file.

Example content.json

{
  "files_optional": {
    "video.mp4": {
      "sha512": "174004c131000b2c8d57a411131f59f7c75d888367c00e3fca5f17e2adf422b2",
      "size": 11227004,
      "piecemap": "video.mp4.piecemap.json"
    },
    "video.mp4.piecemap.json": {
      "sha512": "174004c131000b2c8d57a411131f59f7c75d888367c00e3fca5f17e2adf422b2",
      "size": 11227
    }
  },
  [...]
}

Example video.mp4.piecemap.json

{
 "video.mp4": {
    "piece_size": 1000000,
    "sha512_pieces": ["783afdeb186b50c696030f199d5db233270a84cd6183316be34c623e341dd85f", "0603ce08f7abb92b3840ad0cf40e95ea0b3ed3511b31524d4d70e88adba83daa", ...]
  }
}

Size test with 2784 pieces

Read 1000x times:

So msgpack or json or json.gz?

Questions

Storage

Store as one big file

To make it fast and efficient we need sparse file support in the fs. It works well on ext4 by default and on windows 10 (probably also on 7-8) after setting fsutil sparse setflag testfile.

Pros:

Cons:

Store pieces as separate files

Pros:

Cons:

Uploading via web interface

WebSocket

Pros:

Cons:

Http request

Pros:

Cons:

Plan

sergei-bondarenko commented 7 years ago

Are you reinventing .torrent files?

HelloZeroNet commented 7 years ago

Same goal, but torrent files using it's own non-standard (never used by any other application) encoding (bencode) and outdated, not secure anymore sha1 hash, so I think it would be a mistake to use it.

antilibrary commented 7 years ago

Couldn't a native implementation of IPFS be of help?

-------- Original Message -------- On 5 Aug 2017, 16:38, ZeroNet wrote:

Same goal, but torrent files using it's own non-standard encoding (bencode) and outdated, not secure anymore sha1 encryption.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

HelloZeroNet commented 7 years ago

If we don't case about Tor network compatibility, then is a Torrent plugin in the works: https://github.com/rllola/zeronet-torrent-plugin (python-ipfs is pretty incomplete atm.)

I planning to add it as a plugin, so most of the parts will be re-usable and it will make other network implementations easier.

haschimoto commented 7 years ago

Without Tor compatibility does mean users will be clear/unmasked?

alugarius commented 7 years ago

@haschimoto yes

saber28 commented 7 years ago

Storage

One big file

Uploading? WebSocket suitable for this or separate HTTP post request?

HTTP post request

Compressed piecemap?

Msgpack

japzone1 commented 7 years ago

@antilibrary

Couldn't a native implementation of IPFS be of help?

Read @leycec s above post:

Integrating IPFS into ZeroNet is strictly off-the-table for all of the obvious reasons, including:

nofish ━ on Jun 10, 2017
It [integrating IPFS] could be possible, but then we need to run separate daemon which would reduce portability, more memory usage, probably no full tor support, more connections and probably other problems.

skwerlman ━ on Jun 11, 2017
I hadn't thought about tor support, which is probably a deal breaker since IPFS is UDP-only atm.
That CoC [Code of Conduct] is pretty spooky, since it seems to apply US law globally, and, assuming it's enforceable, it means the IPFS devs are susceptible to state coercion.
I think you're right that IPFS isn't the right solution here.
antilibrary commented 7 years ago

@japzone1 @HelloZeroNet IPFS not only already has good support for big files but they have a whole team dedicated to improving the project. If we use IPFS we can reap all the benefits of their development. On the points raised:

My general feeling is that by reinventing the well on this one we may be creating more work for ZeroNet devs (a whole new part of the system will need to be maintained) and we are isolating ourselves by not being 'compatible' with anything else. For example, if you store big files on IPFS, the site owner could decide to have many interfaces of his site to allow users to get those files, his ZeroNet site could be just one of the interfaces, the others could be on the tor network, ipfs itself, or even clear net.

linkerlin commented 7 years ago

@HelloZeroNet what about webtorrent or WebRTC ?

japzone1 commented 7 years ago

@antilibrary

  1. They're thinking about it, which isn't something we can wait for.
  2. They're working on it but it isn't ready, which is something we can't wait for.
  3. I won't get into that right now, (I'm literally walking out the door at the moment)

Basically we can't wait for critical features, and we don't want the extra overhead.

@linkerlin People are already trying that, but the critical flaw that we've found is getting people to Seed. People either have to leave a tab open, or download a special client. Neither is practical for most people. Plus there's no easy way to hide people's identity with Webtorrents.

HelloZeroNet commented 7 years ago

Exchanging who has the pieces we looking for.

Use the same hashfield we currently using for optional file

Pros:

Cons:

Assume everyone downloaded the whole file

Keep trying until we find someone who has the piece.

Pros:

Cons:

Add a new per-file piecefield

Pros:

Cons:

Storage of piecefield

piecefield = "1"*1000
piecefield += "0"*500
piecefield += "1"*1000
piecefield += "0"*2500
# 1 means downloaded 0 means not downloaded
# So: There is 5000 pieces, first 1000 and and another 1000 piece downloaded after the 1500th

Storage as int

int(hashfield, 2) # sys.getsizeof: 680, msgpack: long too big to convert

Compress it with zlib

zlib.compress(hashfield, 1) # sys.getsizeof: 75, msgpack: 57

Using custom zlib compression: compressor = zlib.compressobj(1, zlib.DEFLATED, -15, 1, 3) # sys.getsizeof: 48, msgpack: 28

tlightsky commented 7 years ago

@HelloZeroNet would platform like Sia also be considered?

HelloZeroNet commented 7 years ago

On Sia you have to pay for the storage. Because of that I think it's not suitable for most of the use cases that we need.

HelloZeroNet commented 7 years ago

status update:

As usual, it's I bigger task, than i originally tought, but I'm getting there, I just did a successful video stream between two clients:

Done:

Still left:

Questions:

skwerlman commented 7 years ago

Should it download big files if "download all files" checked on sidebar? Should big files count in optional files limit? If yes, then it may easily automatically delete every optional file you downloaded before and/or the large files you downloaded

imo, big files should have be treated totally separately in the ui from optional files since they are conceptually different

sergei-bondarenko commented 7 years ago

I vote in favor of two files categories: required and optional. Optional files in current implementation must gone and replaced by big files (all optional files will be "big").

HelloZeroNet commented 7 years ago

@grez911 Not sure what you mean. The big file support is built on optional files feature.

MuxZeroNet commented 7 years ago

Big file support demo at ZeroNet Meetup 2017: https://www.youtube.com/watch?v=U01L7GS30MA&t=820s