HelloZeroNet commented 9 years ago

Torrent like file splitting and distributing

HelloZeroNet commented 9 years ago

Maybe needs to wait until DHT support to able to find peers to needed fileparts

up4 commented 8 years ago

Hi, me again. Two questions:

Why, precisely (in terms of actual code), are big/large files not supported (many references to "more than 1MB")?
Why does it "need to wait until DHT support" lands ?

Thanks!

HelloZeroNet commented 8 years ago

Some blocker for big files:

Currently the downloads stored in memory until they get verified, it need to modify the storage mechanism to make it work efficiently (sparse files?)
There is no file splitting support: You dont know if the file is valid or not until you downloaded the whole file and you can only download from one peer at one time
Most of the peers using Tor network to connect ZeroNet, that is not suitable to transfer big files

As optional files support added around 6 months ago, so DHT is no longer required for this. ZeroNet is created for dynamic websites, most of the site types does not requires big file support, so it's not a priority yet. Maybe adding a torrent client as a plugin is a better solution.

up4 commented 8 years ago

Hi!

No harsh feeling against you personally, but I hold a solid grudge against the "please spare Tor for important text-based traffic from the third world". Very pre-Nietzschean as a moral argument. I just can't. Plus, I have discussed it ad-nauseam in ZeroChat FR (I think). If I were to make a fork of this repo, modify the storage mechanism (and download progress interface) and not touch the protocol itself, I think people would prefer this version over the official ZeroNet stack because it just makes more sense technically. I'd rather do that than an optional plugin. I'm going to make a pull request when I'm done and I'm going to let you decide if you want it in your repo or not.

I just hope we can make this work.

HelloZeroNet commented 8 years ago

Plugin would be nice, i'm planning to do a plugin management interface later, that makes easy to add/remove/install the non-default features

PepinLeBref commented 8 years ago

Isn't possible to use the Tribler's onion routing for the big files ?

HelloZeroNet commented 8 years ago

probably it's possible if we use torrent for big files

zeronetscript commented 8 years ago

Hello, I've created a p2p stream helper here https://github.com/zeronetscript/universal_p2p, with this helper runs on visitor's PC, any ZeroNet web page can use a simple way to stream resource (on the fly)from bittorrent ( a demo site already tested by some visitors)

for example use normal HTML5 video tag , points src to

http://127.0.0.1:7788/bittorrent/v0/stream/c12fe1c06bba254a9dc9f519b335aa7c1367a88a/video.mp4

this http request makes helper's bittorrent backend download torrent by infohash and streaming video.mp4 file to client. with this helper, any ZeroNet website can stream from any existing torrent's resource. I'd plan to support more p2p backend (for example http://127.0.0.1:7788/ipfs/xxx) ,more convient function (direct stream file inside zip archive).

I'd also suggest intergrate tribler to zeronet , and works as my helper way. this makes any existing bittorrent resource accessable without special pages. Protocol prefix in url reserve the ability to support different backend, keeps protocol version to makes protocol upgrade easy.

up4 commented 8 years ago

Solution proposed and being worked on here (I will commit real code in the next couple of days): pull request #521.

alugarius commented 8 years ago

@zeronetscript I really support the IPFS idea, instead of working against we should include IPFS in the future, the network will become amazing. ipfs is beeing ported to javascript and python .... so it should be possible.... well, the most do it already, sharing files over ipfs in 0chan for example.....

Suggestion at first: Including ipfs in the bundle, started with zeronet together, more options to site creators. We have to live this dream! ^^

HelloZeroNet commented 8 years ago

BitTorrent support is planned in the next 6 months, which will provide solution for big files

Bachstelze commented 7 years ago

If you can resign secure Tor usage you can use http://www.cachep2p.com/ in javascript

antilibrary commented 7 years ago

Having a native implementation of IPFS makes complete sense for me as well. We would need to allow support for files packs, for example, instead of having each individual file available for redistribution we could have a 'folders' like approach: all images, video course XYZ, all video courses of category ABC, books from id 1 to 100, all books in my shelves, music album DEF, all classical music albums, etc. The packs could be defined by the site owner with the list of IPFS hashs (or a query that leads to this list), once a pack is selected to be seeded by the user, it will be downloaded and seeded through the local ipfs daemon. The current implementation UI is hard to use because nobody will select individual files from the lists. The user is not interested in seeding specific files, he is interested in helping the site by providing a bit of bandwidth and space. For a site like 0chan, the site owner could pack the different categories and users can then seed the whole category (eg: seed all files from /dev/). For a site like ZeroWiki, the same thing applies, the site owner could split the seeding in sections like: seed all images, seed all pages, seed history of all pages, OR he could use different wiki categories like: seed all content of category ABC. One of the benefits of having it in IPFS is that the files are also available to the opennet via the IPFS gateway. So in the case of sites like antilibrary.bit, once users start to seed the book packs, the book files will be available for everyone to download via Zeronet gateways + ipfs gateway (eg: https://bit.no.com:43110/Antilibrary.bit/ ).

MuxZeroNet commented 7 years ago

I believe ZeroNet protocol does a better job protecting integrity and privacy than the original BitTorrent protocol. ZeroNet has the potential to be a more resilient file sharing network.

	ZeroNet	BitTorrent
Digest	SHA-512/256 (the truncated version)	SHA-1 (vulnerable to BitErrant attack)
Signature	secp256k1 (Bitcoin)	?
Encryption	TLS, on by default, unless OpenSSL is gone	Various, on by deault in many clients
Link	Bitcoin address, ripemd160(sha256())	Magnet URI, SHA-1
File List	Signed `content.json`	BEncoded, not signed

Current blocks:

Tor people will blame us for abusing their bandwidth.
No I2P support yet.
No DHT yet, but many I2P torrent clients do not implement DHT either.

HelloZeroNet commented 7 years ago

BitTorrent clients also have the advantage of ~15years spent on optimization and the fact that they don't have to worry about the dynamic content. (eg. they can identify pieces by a simple id on zeronet you have to identify by filename or hash)

antilibrary commented 7 years ago

We could benefit from this: https://ipfs.io/blog/23-js-ipfs-0-23/

haschimoto commented 7 years ago

isnt the problem with bittorrent protocol/client for big-files losing any protection from Tor?

funny110 commented 7 years ago

When big file surport will come out ?

MuxZeroNet commented 7 years ago

When big file support will come out ?

Hello there!

Big file feature requires more research and more frequent discussion. Instead of asking this, do you think you can contribute any ideas regarding any of the bullet points below? Write another comment and we (the community) will evaluate your thoughts.

Hashing, integrity How can we hash the big files in such a way that small segments can be verified efficiently?
Routing, data availability How can we find the file in the first place? How can we encourage people to seed? Node responsibilities? Blacklisting free-riders? Can I cut my files and put it away after the moment it is finished? What about caring about the others?
Usability, caching How can we cache the file? zncache/ ? Can I use evil AJAX requests to download the whole Internet to your computer and blow up your hard drive? What kind of user interaction must take place?

leycec commented 7 years ago

@MuxZeroNet Thanks for the thorough synopsis of open questions! Several are currently under discussion at the ZeroTalk thread for this topic.

Since I'd prefer this clearnet issue serve as the central hub for this topic, I've taken the liberty of copying across a few of the more notable comments on that ZeroTalk thread. If this was unspeakably bad, just let me know and I'll remove the offending quotes.

Let's do this.

Hashing, integrity How can we hash the big files in such a way that small segments can be verified efficiently?

The canonical solution is Merkle trees (i.e., hash trees). I'll be astonished if ZeroNet doesn't eventually adopt some variant on a hash tree for distributing and validating big files. The devil is in the details, however:

nofish ━ on Jun 09, 2017 It's planned this summer, right now I experimenting with merkle trees. Conclusions so far: Pros: Smaller content.json files (only one 64bytes roothash) Cons: Needs to send proof with every pieces. (640 bytes/piece at 1000 pieces) and slower confirmation time So I'm not sure if it's worth it...

p2p ━ on Jun 13, 2017 IMO we can have quick-hash-tree + SHA-512-whole-file-hash as default, and all-SHA-512-hash-tree as fallback. In content.json we just need to store these 3 kinds of hashes: quick-hash-tree's root hash, SHA-512-whole-file-hash, and all-SHA-512-hash-tree's root hash. First, we calculate quick-hash-tree in which much faster checksums are used such as CRCs to calculate leaf node hashes, and then combine the whole file, calculate the SHA-512-whole-file-hash. In case of attack ( all CRCs are right but SHA-512-whole-file-hash is invalid ), we fallback to all-SHA-512-hash-tree to find out which data piece is fake. In this way, we can address the performance problem.

Integrating IPFS into ZeroNet is strictly off-the-table for all of the obvious reasons, including:

nofish ━ on Jun 10, 2017 It [integrating IPFS] could be possible, but then we need to run separate daemon which would reduce portability, more memory usage, probably no full tor support, more connections and probably other problems.

skwerlman ━ on Jun 11, 2017 I hadn't thought about tor support, which is probably a deal breaker since IPFS is UDP-only atm. That CoC [Code of Conduct] is pretty spooky, since it seems to apply US law globally, and, assuming it's enforceable, it means the IPFS devs are susceptible to state coercion. I think you're right that IPFS isn't the right solution here.

The remainder of @MuxZeroNet's line of questioning ultimately reduces to user experience (UX). The ideal approach would be to incrementally generalize ZeroNet's existing small optional file support to gradually encapsulate all of the functionality required for big optional file support.

To do so sanely, a browser-based user interface for managing optional files is all but essential. Specifically, ZeroHello might introduce a new zite-specific context menu item (e.g., named "Files," "Share," "Details," "Content," or something similar). When clicked, this item might open a new browser tab:

Displaying a detailed list of all optional files available from the selected zite.
Displaying torrent-style AJAX controls permitting each optional file to be independently started, paused, stopped, and deleted.

In other words, I2PSnark in ZeroNet drag ala:

I2PSnark UI

Sadly, the size of even small optional files currently contributes to the 10MB zite limit. Generalizing ZeroNet's existing support from small to big optional files thus requires decoupling the size of optional files from the maximum size of the zite hosting those files.

The ideal approach is probably the incremental approach – one slow, languorous pull request at a time until we're where we eventually want to be. This road is long and winding, but utopia remains in sight.

HelloZeroNet commented 7 years ago

Plan 0.5

File hashing: .piecemap.json

To avoid large content.json files move the piece hashes to separate file.

Example content.json

{
  "files_optional": {
    "video.mp4": {
      "sha512": "174004c131000b2c8d57a411131f59f7c75d888367c00e3fca5f17e2adf422b2",
      "size": 11227004,
      "piecemap": "video.mp4.piecemap.json"
    },
    "video.mp4.piecemap.json": {
      "sha512": "174004c131000b2c8d57a411131f59f7c75d888367c00e3fca5f17e2adf422b2",
      "size": 11227
    }
  },
  [...]
}

Example video.mp4.piecemap.json

{
 "video.mp4": {
    "piece_size": 1000000,
    "sha512_pieces": ["783afdeb186b50c696030f199d5db233270a84cd6183316be34c623e341dd85f", "0603ce08f7abb92b3840ad0cf40e95ea0b3ed3511b31524d4d70e88adba83daa", ...]
  }
}

Size test with 2784 pieces

video.mp4.piecemap.json: 189k
video.mp4.piecemap.json.gz: 107k
video.mp4.piecemap.json.bz2: 93k
video.mp4.piecemap-indent.json: 211k
video.mp4.piecemap-indent.json.gz: 108k
video.mp4.piecemap.msgpack: 97k
video.mp4.piecemap.msgpack.gz: 94k
video.mp4.piecemap.msgpack.bz2: 92k

Read 1000x times:

video.mp4.piecemap.json: 1.39s
video.mp4.piecemap.json.gz: 2.7s
video.mp4.piecemap.msgpack: 0.13s

So msgpack or json or json.gz?

Questions

How/where to store which pieces we have?
Uploading? WebSocket suitable for this or separate HTTP post request?
Compressed piecemap?
How to display it in the UI?
Is there any real-world use case to support partial modifications on big files?

Storage

Store as one big file

To make it fast and efficient we need sparse file support in the fs. It works well on ext4 by default and on windows 10 (probably also on 7-8) after setting fsutil sparse setflag testfile.

Pros:

Less files
External app support: you can play videos in any player

Cons:

It will be slow on older systems: fat32 (android) and windows xp
Probably need more work

Store pieces as separate files

Pros:

More compatibility, works equally well on fat32 and windows xp

Cons:

No external app support

Uploading via web interface

WebSocket

Pros:

Already authenticated

Cons:

Looks like the geventwebsocket library does not support streaming
The websocket channel is blocked during the upload (splitting could help)

Http request

Pros:

Different channel and connection, so does not interfere with other API calls or messages

Cons:

Has to be authenticated separately
Cookie/CORS problems?

Plan

[x] Experimental piecemap hasher
[x] Add piecemap via uploading
[x] Add piecemap via signing for files larger than 5MB
[ ] Check permissions
[x] File downloading using sparse
[ ] Demo video playing
[ ] Demo uploader site
[ ] UI modifications

sergei-bondarenko commented 7 years ago

Are you reinventing .torrent files?

HelloZeroNet commented 7 years ago

Same goal, but torrent files using it's own non-standard (never used by any other application) encoding (bencode) and outdated, not secure anymore sha1 hash, so I think it would be a mistake to use it.

antilibrary commented 7 years ago

Couldn't a native implementation of IPFS be of help?

-------- Original Message -------- On 5 Aug 2017, 16:38, ZeroNet wrote:

Same goal, but torrent files using it's own non-standard encoding (bencode) and outdated, not secure anymore sha1 encryption.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

HelloZeroNet commented 7 years ago

If we don't case about Tor network compatibility, then is a Torrent plugin in the works: https://github.com/rllola/zeronet-torrent-plugin (python-ipfs is pretty incomplete atm.)

I planning to add it as a plugin, so most of the parts will be re-usable and it will make other network implementations easier.

haschimoto commented 7 years ago

Without Tor compatibility does mean users will be clear/unmasked?

alugarius commented 7 years ago

@haschimoto yes

saber28 commented 7 years ago

Storage

One big file

Uploading? WebSocket suitable for this or separate HTTP post request?

HTTP post request

Compressed piecemap?

Msgpack

japzone1 commented 7 years ago

@antilibrary

Couldn't a native implementation of IPFS be of help?

Read @leycec s above post:

Integrating IPFS into ZeroNet is strictly off-the-table for all of the obvious reasons, including:

nofish ━ on Jun 10, 2017
It [integrating IPFS] could be possible, but then we need to run separate daemon which would reduce portability, more memory usage, probably no full tor support, more connections and probably other problems.

skwerlman ━ on Jun 11, 2017
I hadn't thought about tor support, which is probably a deal breaker since IPFS is UDP-only atm.
That CoC [Code of Conduct] is pretty spooky, since it seems to apply US law globally, and, assuming it's enforceable, it means the IPFS devs are susceptible to state coercion.
I think you're right that IPFS isn't the right solution here.

antilibrary commented 7 years ago

@japzone1 @HelloZeroNet IPFS not only already has good support for big files but they have a whole team dedicated to improving the project. If we use IPFS we can reap all the benefits of their development. On the points raised:

They do think of supporting Tor, but then again, may not be a good idea, and if users want to share big files they should think about using VPN or something else.
There are efforts to make IPFS portable
On their code of conduct, because of their architecture, there is just so much they can do to avoid copyright infringements (eg: blacklisting a hash on their gateway - one could create a gateway without the blacklist; blacklist a nodeId on their nodes - one could have his own network of nodes and clients could still connect to them).

My general feeling is that by reinventing the well on this one we may be creating more work for ZeroNet devs (a whole new part of the system will need to be maintained) and we are isolating ourselves by not being 'compatible' with anything else. For example, if you store big files on IPFS, the site owner could decide to have many interfaces of his site to allow users to get those files, his ZeroNet site could be just one of the interfaces, the others could be on the tor network, ipfs itself, or even clear net.

linkerlin commented 7 years ago

@HelloZeroNet what about webtorrent or WebRTC ?

japzone1 commented 7 years ago

@antilibrary

They're thinking about it, which isn't something we can wait for.
They're working on it but it isn't ready, which is something we can't wait for.
I won't get into that right now, (I'm literally walking out the door at the moment)

Basically we can't wait for critical features, and we don't want the extra overhead.

@linkerlin People are already trying that, but the critical flaw that we've found is getting people to Seed. People either have to leave a tab open, or download a special client. Neither is practical for most people. Plus there's no easy way to hide people's identity with Webtorrents.

HelloZeroNet commented 7 years ago

Exchanging who has the pieces we looking for.

Use the same hashfield we currently using for optional file

Pros:

Find/storage already implemented, so much less work

Cons:

Memory/bw usage: 2 byte/piece, so for a 5GB file with 1MB piece size it's at least 10kb data to store and transfer (per peer, per file)
Limited efficiency: We use 2 byte identifier for optional files (first 2 bytes of sha512), so if two piece has the same 4 char of the hash, then it will give false results, so we have to keep trying until we find a valid result.

Assume everyone downloaded the whole file

Keep trying until we find someone who has the piece.

Pros:

No extra bw/store/mem requirements

Cons:

Less reliable response speed as we have to connect and ask peers until we find someone
Can't display separate leech/seed number

Add a new per-file piecefield

Pros:

Won't trash hashfield: Only adds the file's root hash to it
We can find piece based on file hash and it's piece number, so no hash collision chance and much more efficient

Cons:

Added complexity: Need new protocol extension to exchange and find in this field
Has to store the peer's piecefield

Storage of piecefield

piecefield = "1"*1000
piecefield += "0"*500
piecefield += "1"*1000
piecefield += "0"*2500
# 1 means downloaded 0 means not downloaded
# So: There is 5000 pieces, first 1000 and and another 1000 piece downloaded after the 1500th

Storage as int

int(hashfield, 2) # sys.getsizeof: 680, msgpack: long too big to convert

Compress it with zlib

zlib.compress(hashfield, 1) # sys.getsizeof: 75, msgpack: 57

Using custom zlib compression: compressor = zlib.compressobj(1, zlib.DEFLATED, -15, 1, 3) # sys.getsizeof: 48, msgpack: 28

tlightsky commented 7 years ago

@HelloZeroNet would platform like Sia also be considered?

HelloZeroNet commented 7 years ago

On Sia you have to pay for the storage. Because of that I think it's not suitable for most of the use cases that we need.

HelloZeroNet commented 7 years ago

status update:

As usual, it's I bigger task, than i originally tought, but I'm getting there, I just did a successful video stream between two clients:

Done:

Exchange piecefield (which pieces someone has within a file) between peers
Implemented a custom compression format for piecefields
File downloading based on piecefield
Http stream with ranged file support (seek in video files)

Still left:

[x] Save / restore piecefields between client restarts
[ ] Make it compatible with OptionalStats plugin (Files tab on ZeroHello)
[ ] Non-streaming file download (start all piece download without file read)
[ ] More testing

Questions:

Should it download big files if "download all files" checked on sidebar?
Should big files count in optional files limit? If yes, then it may easily automatically delete every optional file you downloaded before and/or the large files you downloaded

skwerlman commented 7 years ago

Should it download big files if "download all files" checked on sidebar? Should big files count in optional files limit? If yes, then it may easily automatically delete every optional file you downloaded before and/or the large files you downloaded

imo, big files should have be treated totally separately in the ui from optional files since they are conceptually different

sergei-bondarenko commented 7 years ago

I vote in favor of two files categories: required and optional. Optional files in current implementation must gone and replaced by big files (all optional files will be "big").

HelloZeroNet commented 7 years ago

@grez911 Not sure what you mean. The big file support is built on optional files feature.

MuxZeroNet commented 7 years ago

Big file support demo at ZeroNet Meetup 2017: https://www.youtube.com/watch?v=U01L7GS30MA&t=820s

HelloZeroNet / ZeroNet

Big file support #7

Plan 0.5

File hashing: .piecemap.json

Example content.json

Example video.mp4.piecemap.json

Size test with 2784 pieces

Read 1000x times:

Questions

Storage

Store as one big file

Store pieces as separate files

Uploading via web interface

WebSocket

Http request

Plan

Exchanging who has the pieces we looking for.

Use the same hashfield we currently using for optional file

Assume everyone downloaded the whole file

Add a new per-file piecefield

Storage of piecefield

Storage as int

Compress it with zlib

status update:

Done:

Still left:

Questions: