HelloZeroNet commented 4 years ago

Content addressed data access

Why?

To de-duplicate files between sites.
Allow better site archiving
Avoid data loss on site moderation/changes

What?

Store and access based on file's hash (or merkle root for big files)

How?

File storage

data/static/[download_date]/[filename].[ext]

[ ] Possible alternative #1: data/__static__/[download_date]/[hash].[ext]
[ ] Possible alternative #2: data/__static__/[download_date]/[partial_hash].[ext]
[ ] Possible alternative #3: data/__static__/[partial_hash]/[hash].[ext]

Possible alternative to static content root directory (instead of data/__static__/):

[ ] data-static/
[ ] data/__immutable__/

Variables:

download_date (example: 2019-09-05): To avoid the per-directory file number limit and make the files easier to find.
hash: The merkle root of the the file (sha512t256)
partial_hash: The first 8 character of the hash the path length (incremental postfix could be required on file name collision)
filename: File name (The first requested, may vary between sites) (incremental postfix could be required on file name collision)
ext: File extension (The first requested, may vary between sites)

Url access

http://127.0.0.1:43110/f/[hash].[ext] (for non-big file) http://127.0.0.1:43110/bf/[hash].[ext] (for big file)

File name could be added optionally as, but the hash does not depends on the filename:

http://127.0.0.1:43110/f/[hash]/[anyfilename].[ext]

File upload

Create an interface similar to big life upload (XMLHttpRequest based)
Scan directory: data/__static__/__add__: Copy file to this directory, visit ZeroHello Files tab, Click on "Hash added files"

File download process

Find possible peers with site-local findHashId/getHashfield / trackers
For big files: Download piecefield.msgpack
Use normal getFile to download the file/pieces (use sha512 in the request instead of the site/inner_path)

Directory upload

For directory uploads we need to generate a content.json that contains the reference to other files. Basically these would be sites where the content.json is authenticated by sha512t instead of the public address of the owner.

Example:

{
    "title": "Directory name",
    "files_link": {
        "any_file.jpg": {"link": "/f/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754", "size": 3242},
        "other_dir/any_file.jpg": {"link": "/bf/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754", "size": 3821232}
    }
}

These directories can be accessed on the web interface using http://127.0.0.1:43110/d/{sha512t hash of generated content.json}/any_file.jpg (file list can be displayed on directory access)

Downloaded files and content.json stored in data/static/[download_date]/{Directory name} directory.

Each files in the directory also accessible using http://127.0.0.1:43110/f/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754/any_file.jpg

As optimization if the files accessed using a directory reference the peer list can be fetched using findHashId/getHashId from other peers without accessing the trackers.

Possible problems

Too many tracker requests

Announcing and keep track of peers for large amount (10k+) of files can be problematic.

Solution #1

Send tracker request only for large (10MB+) files. To get peer list for smaller files we use the current, getHashfield / findHashId solution.

Cons:

It could be hard/impossible to find peers to small files if you are not connected to a site where that file is popular.
Hash collision as we use only the first 4 letter of the hash in hashfield

Solution #2

Announce all files to zero:// trackers, reduce re-announce time to eg. 4 hours (re-announce within 1 minute if new file added) (sending this amount of request to bittorrent trackers could be problematic) Don't store peers for file that you have 100% downloaded.

Request for 10k files: 32 * 10k = 320k (optimal case)

Possible optimization #1:

Change tracker communication to request client id token and only communicate hash additions / deletions until the expiry time. Token expiry time extends with every request.

Possible optimization #2:

Take some risk of hash collision and allow the tracker to specify how many character it needs from the hashes. (based on how many how many hashes it stores) Estimated request size to announce 22k files:

Full hash (32bytes): 770k
First 6 bytes (should be good until 10m hashes): 153k
First 7 bytes (should be good until 2560m hashes): 175k
First 8 bytes (should be good until 655360m hashes): 197k

Cons:

Depends on the zero:// trackers
Heavy requests, more CPU/BW load to trackers

Download all optional files / help initial seed for specific user

Downloading all optional files in a site or uploaded by a specific user won't be possible anymore: The optional files no longer will be stored in the user's content.json file files_optional node.

Solution #1

Add a files_link node to content.json that stores uploaded files in the last X days. (with sha512, ext, size, date_added nodes)

blurHY commented 4 years ago

Why not directly abandon the protocol ?

Never do duplicating work please.

Starting from content-addressing, are you going to implement DHT and other stuff that is already on IPFS ?

What's your opinion on IPZN ?

BTW, you can find me on telegram @blurhy

HelloZeroNet commented 4 years ago

Adding IPFS protocol support is also a possible option, but I don't want to depend on external application

DHT support with many uploaded files would be very inefficient: Eg. if you want to announce your IP to 100 000 files, then you have to connect to 1000s of different computers, because the DHT buckets are distributed between random computers.

blurHY commented 4 years ago

So you don't agree on modularity ?

Adding IPFS protocol support

Instead of saying adding ipfs support, I'd say it's a radical change

blurHY commented 4 years ago

DHT support with many uploaded files would be very inefficient: Eg. if you want to announce your IP to 100 000 files, then you have to connect to 1000s of different computers, because the DHT buckets are distributed between random computers.

What's ZeroNet for ?

Anti-censorship ? Right ?

There's a saying, 'grapes are sour when you can't eat them'

purplesyringa commented 4 years ago

While modularity is important, using IPFS as the main backend doesn't look good to me. One of the reasons is depending on an external tool (that's what nofish said). We can never be sure there are no radical changes that might make ZeroNet stop working. Also, we'll have to spend much time switching to IPFS and making it compatible to classic ZeroNet than what we'd have to do if we just tuned the classic protocol a bit.

purplesyringa commented 4 years ago

I might want to reword that better: I'm not against an IPFS-based system, but that shouldn't be connected to ZeroNet.

blurHY commented 4 years ago

tuned the classic protocol a bit.

Are you sure ? Anyways, a bit is not enough, or IPFS won't have such much code.

that shouldn't be connected to ZeroNet.

Yeah, but I will be connected to IPZN

We can never be sure there are no radical changes that might make ZeroNet stop working

Basically, impossible.

purplesyringa commented 4 years ago

Anyways, a bit is not enough, or IPFS won't have such much code.

It is.

Yeah, but I will be connected to IPZN

Sure, you can develop a decentralized system yourself, but don't call it ZeroNet. If it turns out to be better, we'll switch.

Basically, impossible.

POSIX is going to be alive for quite a long while. Same with Python 3.7.

purplesyringa commented 4 years ago

Additionally, I'm not quite sure but I believe that IPFS + IPZN is slower than classic ZeroNet.

blurHY commented 4 years ago

It is.

It isn't.

Sure, you can develop a decentralized system yourself, but don't call it ZeroNet. If it turns out to be better, we'll switch.

I don't want to call it ZeroNet

I'm not quite sure but I believe that IPFS + IPZN is slower than classic ZeroNet.

It depends on IPFS. Do you know that DHT is not the only option for routing in IPFS ?

@mkg20001

purplesyringa commented 4 years ago

It isn't.

It is. We have a rather big base for new features.

I don't want to call it ZeroNet

Ok, then don't say that IPZN is better than ZeroNet. It might be better than ZeroNet in the future if you finish it. @krixano and me worked on another decentralized system that could possibly better (though somewhat non-classical), but we didn't end up implementing it. We didn't advertise it here and there.

It depends on IPFS. Do you know that DHT is not the only option for routing in IPFS ?

Quite probably, but adding DHT (and others) to ZeroNet should be easier than switching to a completely different architecture.

blurHY commented 4 years ago

a rather big base for new features.

But why does IPFS have so much code ?

Is the code unneceasry ? No.

on another decentralized system that could possibly better

What ?

We didn't advertise it here and there.

As a result, I know nothing about your project.

to ZeroNet should be easier than switching to a completely different architecture.

Maybe, but as I said, are the tons of code of IPFS unnecessary ?

It implies there're a lot features need to be done.

When you find all the features are done, you will also realize you re-implemented IPFS.

I mean IPFS has more features and we should not do duplicating work, just switch to IPFS.

So I'd rather re-implement application layer code instead of lower layer code.

That's easier

purplesyringa commented 4 years ago

Is the code unneceasry ? No.

It unneceasry (sic) for ZeroNet's usecases.

What ?

A typo, sorry; it should be "possible be better".

As a result, I know nothing about your project.

Yes, that's what I'm talking about! Don't announce before an alpha version, we don't want another BolgenOS.

Maybe, but as I said, are the tons of code of IPFS unnecessary ?

Some of them are unnecessary for ZeroNet usecases.

It implies there're a lot features need to be done.

See above.

That's easier

That's how it works today: add 10 levels of abstraction and let it figure itself out! We should think about being better, not being easy to build.

blurHY commented 4 years ago

It unneceasry (sic) for ZeroNet's usecases.

Do you want more awesome features ?

Don't announce before an alpha version

We need ideas and improvements on paperwork to achieve a better design

We should think about being better, not being easy to build.

Modularity is better as well as easy to build

another decentralized system

What's that project

purplesyringa commented 4 years ago

Do you want more awesome features ?

List them. I believe that most of them can be easily implemented in classic ZeroNet and even more easily after adding content-addressed files.

We need ideas and improvements on paperwork to achieve a better design

It looks like you learned a new buzzword "IPFS" and now you're saying "IPFS supports more features, go use IPFS!" First, say what you're missing and how rewriting all ZeroNet code to support IPFS will be faster or easier (that's what you're apealling to) than adding them as classic ZeroNet plugins.

Modularity is better as well as easy to build

We don't want to depend on an external service. We could separate ZeroNet to backend and frontend later when we grow bigger, but we can't just take someone else's project and use it, mostly because we can't add features/fix bugs if IPFS guys won't like that.

What's that project

This is not related to ZeroNet mostly, so I'll keep short. Think of it as a decentralized gopher-like system.

blurHY commented 4 years ago

List them

For example, FileCoin.

mostly because we can't add features/fix bugs

Why don't you add more features to tcp/ip/http/https ?

It looks like you learned a new buzzword "IPFS"

I doubt if you have ever read about IPFS ?

purplesyringa commented 4 years ago

For example, FileCoin

Another non-buzzword one plaase. And even then, FileCoin can be implemented as a plugin.

Why don't you add more features to tcp/ip/http/https ?

Is this sarcasm?

There's no room for improvements in TCP anymore;
IPv6 is improved IPv4, but I doubt I can think of an IP improvement;
HTTP is being improved, look at HTTP 2.0, look at how keep-alive connections were added, look at CORS, etc.;
HTTPS can't be improved by definition because it's just a thin layer on SSL/TLS. SSL and TLS are being improved AFAIK. But, again, I doubt I can think of an SSL/TLS improvement.

I doubt if you have ever read about IPFS ?

Sure I did. Please don't ignore my questions and answer: what IPFS features can't be added to ZeroNet?

blurHY commented 4 years ago

FileCoin can be implemented as a plugin.

Huh, do you think you guys have enough effort ?

sarcasm

Yeah, of course.

I mean your concern is nonsense and will never happen, because it's infrastructure like http.

what IPFS features can't be added to ZeroNet?

Nonsense

mkg20001 commented 4 years ago

IPLD claims to be a "merkel forest" that supports all datatypes Implementing IPLD into ZeroNet would therefore require to first write IPLD-compatible data-types to add the zeronet-objects into the IPLD-layer Thus we'd have to integrate ZeroNet into IPLD anyways and this discussion is IMO completely pointless Additionally we have a p2p framework that tries to solve the needs of everyone, so people can focus on their apps and not the network stuff, called libp2p. Etherum recently made the switch and ZeroNet could do that as well, since if anything's missing in libp2p, it can simply be added , thus squaring the value of the framework for both sides

Thus I find it entirely pointless to fight over what's best My point is: Let's join together instead of fighting, so I created the idea of adding ZeroNet into IPLD which I tried to achieve with ZeroNet-JS (but gave up since summer holidays were over 😅) What could possibly go wrong? In the end, if we find a way to add layers to libp2p to circumvent gfw by hiding it in plain HTTP traffic, it would benefit every p2p app. Not just ZeroNet. So we don't need 3 wheels if we can all work on one for everyone.

purplesyringa commented 4 years ago

Huh, do you think you guys have enough effort ?

Don't make us do what you want to. Do it yourself: either write your own network or bring features to ours.

I mean your concern is nonsense and will never happen, because it's infrastructure like http.

What the heck, merger sites were added, optional files were added, big files were added!...

Nonsense

It looks like a classic "no u".

purplesyringa commented 4 years ago

@mkg20001 Your arguments look better. While I wouldn't use IPFS, libp2p might be a better solution because it's at least used by many projects, so it's unlikely that breaking changes are added. So, is the plan to switch to libp2p?

blurHY commented 4 years ago

Do it yourself

Of course, opensource voluntary

merger sites were added, optional files were added, big files were added!...

You think these are features ? It's just workarounds for bad design

blurHY commented 4 years ago

While I wouldn't use IPFS

Go and read IPFS papers agian

I don't know what to say.

purplesyringa commented 4 years ago

Of course, opensource voluntary

Right. nofish can't be forced to do something unless he or ZeroNet community finds it important (in the latter case, we'll either end up with forking or with convincing nofish). Go find those who like your idea and start development.

You think these are features ? It's just workarounds for bad design

Uh, what? Sure, big files might be a hotfix but how is optional/required file separation bad design?

purplesyringa commented 4 years ago

We can even start at IPFS homepage:

Take a look at what happens when you add a file to IPFS.

See? Add a file. ZeroNet is not just about files: PeerMessage works without files and should never be.

blurHY commented 4 years ago

PeerMessage works without files and should never be.

Huh, you definitely not knowing about pubsub and IPFS's plan of dynamic web

purplesyringa commented 4 years ago

Huh, you definitely not knowing about pubsub and IPFS's plan of dynamic web

Quite probable. Now show me a working implementation of pubsub and the IPFS-like dynamic web in Python.

purplesyringa commented 4 years ago

You were saying "well pubsub is not ready yet" and "IPFS development is slow", and now you're asking us to switch to something that's not ready!

blurHY commented 4 years ago

pubsub

Take a look at https://gitlab.com/ipzn/ipzn/wikis/Notes, these are WIP IPFS features

switch to something that's not ready!

So what ? Just wait. Do you want a toy project ?

purplesyringa commented 4 years ago

So what ? Just wait.

Yes. You can't implement something before its dependencies are ready!

blurHY commented 4 years ago

how is optional/required file separation bad design?

Because of ZeroNet's default behaviour: keeping all files and never delete them.

How is optional file feature hard to implement ?

purplesyringa commented 4 years ago

Because of ZeroNet's default behaviour: keeping all files and never delete them.

That's because it used to be a correct solution back then, when ZeroNet was small. You can't just make a project and say "it's finished", you have to adopt it endlessly.

How is optional file feature hard to implement ?

It's not hard, nofish implemented it when it started being important.

blurHY commented 4 years ago

That's because it used to be a correct solution back then, when ZeroNet was small. You can't just make a project and say "it's finished", you have to adopt it endlessly.

So I am going to start a project from scratch, and based on IPFS, for a better design

blurHY commented 4 years ago

add layers to libp2p to circumvent gfw by hiding it in plain HTTP traffic

As IPFS plugin maybe

purplesyringa commented 4 years ago

Let's start tracing your point:

You want to use IPFS as backend;
Adding IPFS is better because of modular architecture;
Modular architecture allows adding features separately.

Right?

blurHY commented 4 years ago

Let's start tracing your point:

* You want to use IPFS as backend;

* Adding IPFS is better because of modular architecture;

* Modular architecture allows adding features separately.

Right?

And IPFS has a team working on it, possibly full-time

purplesyringa commented 4 years ago

Great. So the only reason to switch to IPFS is because of more features. Now list them for us -- dumb people who are too silly to understand it.

mkg20001 commented 4 years ago

@imachug The plan for ZeroNet-JS was to have both znv2 (the zeronet msgpack RPC protocol) and a custom one on top of libp2p. Additionally the plan for IPLD-integration was to store objects locally using IPLD (instead of directly using the FS) and then exchange them with other ZeroNet clients as "plain" ZeroNet objects. (Also I had an idea to replace the SQL for multi-user with something that has SQL-syntax but doesn't do as much I/O and computation as sqlite, to fix performance) That way we can experiment with new things, without breaking compatibility to the "mainnet" too much

The reason why I even started this project is because, from my perspective of view, it looked like this:

IPLD: "Let's build bridges, not walls, by building a common base-implementation for all kinds of DAGs" ZeroNet: "Let's run our own torrent system called bigfiles"

libp2p: "Let's build upon common standards where possible, to keep problems with compatibility at a minimum" ZeroNet: "Let's re-invent mDNS for discovery, because, heck, we can"

If we continue that path (with all of p2p, not just zeronet), we'll be just having another silod-problem, just at another point of the protocol

If we combine our forces, through efforts such as multiformats (which tries to "support it all") then we'll have a truly decentralized internet.

blurHY commented 4 years ago

Great. So the only reason to switch to IPFS is because of more features. Now list them for us -- dumb people who are too silly to understand it.

Wait. It takes about two months to get a overview of IPFS for me. So make you know that how great IPFS is is not possible in a few words.

You basically don't want to admit that ZeroNet is not the best one nowadays

blurHY commented 4 years ago

custom one on top of libp2p

No, IPFS has pubsub now

blurHY commented 4 years ago

I think more introductions/docs are needed for IPZN.

Too much misunderstanding.

mkg20001 commented 4 years ago

@blurHY We literally don't have clear goals with IPZN defined yet. I don't even fully understand it. It's better to focus on extensibility with maintained compatibility then "starting from scratch", or IPZN won't be better than any of the projects it claimed to replace. After all, we're building bridges not walls.

purplesyringa commented 4 years ago

Wait. It takes about two months to get a overview of IPFS for me. So make you know that how great IPFS is is not possible in a few words.

Seriously? libp2p feature is "a small layer for building huge decentralized apps", ZeroNet is "we have sites, sites are stored by everyone who wants to store them, you can download a site and even post comments". What's IPFS feature?

You basically don't want to admit that ZeroNet is not the best one nowadays

I understand that ZeroNet might not be going in the correct direction, but I'm pretty sure that using IPFS is not the correct way.

No, IPFS has pubsub now

See, IPFS created pubsub and it's not compatible with other protocols! Wow!

I think more introductions/docs are needed for IPZN.

That's right.

blurHY commented 4 years ago

We literally don't have clear goals with IPZN defined yet

I have some free time tomorrow to write the docs.

You can understand IPFS as a global filesystem and there's a communicator among all peers.

In this way, you can almost build any type of decentralized application.

I call the ZeroNet support on IPZN as bridge because ZeroNet may not support some features of IPFS, so it's partly compatible.

What's IPFS feature?

a global filesystem and a communicator among all peers

All decentralized web can be broken down as these two parts

See, IPFS created pubsub and it's not compatible with other protocols! Wow!

We can have a bridge to ZeroNet by a custom protocol on libp2p

mkg20001 commented 4 years ago

Just to clarify: The reason why I'm all behind integrating/building on top of IPLD/libp2p is, that those projects try to be "compatible" by design, by for example allowing to swap out the DHT or pubsub implementation as needed, while on the other hand we have custom build protocol-stacks that make fixed assumptions and thus are harder to connect with each other (Edit: That is, aside from not having to re-invent the wheel)

Also, py-libp2p made substantial progress and should be ready to use quite soon, so that's something worth taking a look.

purplesyringa commented 4 years ago

Hm. A big part of the world-wide community switched to libp2p, so using it might make sense (one of the reasons is making government blocks harder). But I think that most (if not all) IPFS features are available (or will be soon) in ZeroNet.

blurHY commented 4 years ago

try to be "compatible" by design

When you always want to have compatibility, there's no space left for innovations then.

However I don't mean IPZN is not compatible, just partly

blurHY commented 4 years ago

I think that most (if not all) IPFS features are available (or will be soon) in ZeroNet.

Why do you want to copy their features ?

purplesyringa commented 4 years ago

Why do you want to copy their features ?

Reread my comment please. See, I said that most features are available right now, and switching to IPFS will take a lot more time than just adding one or two features. BTW, you didn't say how IPFS is better than all other distributed file systems (e.g. ZeroNet).

blurHY commented 4 years ago

BTW, you didn't say how IPFS is better than all other distributed file systems (e.g. ZeroNet).

Reread my comments too.

switching to IPFS will take a lot more time

More time ? Do you think the application layer is harder than infrastructure ?

HelloZeroNet / ZeroNet

Proposal: Content addressed data #2192

Content addressed data access

Why?

What?

How?

File storage

Url access

File upload

File download process

Directory upload

Possible problems

Too many tracker requests

Solution #1

Solution #2

Possible optimization #1:

Possible optimization #2:

Download all optional files / help initial seed for specific user

Solution #1