HelloZeroNet commented 5 years ago

Content addressed data access

Why?

To de-duplicate files between sites.
Allow better site archiving
Avoid data loss on site moderation/changes

What?

Store and access based on file's hash (or merkle root for big files)

How?

File storage

data/static/[download_date]/[filename].[ext]

[ ] Possible alternative #1: data/__static__/[download_date]/[hash].[ext]
[ ] Possible alternative #2: data/__static__/[download_date]/[partial_hash].[ext]
[ ] Possible alternative #3: data/__static__/[partial_hash]/[hash].[ext]

Possible alternative to static content root directory (instead of data/__static__/):

[ ] data-static/
[ ] data/__immutable__/

Variables:

download_date (example: 2019-09-05): To avoid the per-directory file number limit and make the files easier to find.
hash: The merkle root of the the file (sha512t256)
partial_hash: The first 8 character of the hash the path length (incremental postfix could be required on file name collision)
filename: File name (The first requested, may vary between sites) (incremental postfix could be required on file name collision)
ext: File extension (The first requested, may vary between sites)

Url access

http://127.0.0.1:43110/f/[hash].[ext] (for non-big file) http://127.0.0.1:43110/bf/[hash].[ext] (for big file)

File name could be added optionally as, but the hash does not depends on the filename:

http://127.0.0.1:43110/f/[hash]/[anyfilename].[ext]

File upload

Create an interface similar to big life upload (XMLHttpRequest based)
Scan directory: data/__static__/__add__: Copy file to this directory, visit ZeroHello Files tab, Click on "Hash added files"

File download process

Find possible peers with site-local findHashId/getHashfield / trackers
For big files: Download piecefield.msgpack
Use normal getFile to download the file/pieces (use sha512 in the request instead of the site/inner_path)

Directory upload

For directory uploads we need to generate a content.json that contains the reference to other files. Basically these would be sites where the content.json is authenticated by sha512t instead of the public address of the owner.

Example:

{
    "title": "Directory name",
    "files_link": {
        "any_file.jpg": {"link": "/f/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754", "size": 3242},
        "other_dir/any_file.jpg": {"link": "/bf/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754", "size": 3821232}
    }
}

These directories can be accessed on the web interface using http://127.0.0.1:43110/d/{sha512t hash of generated content.json}/any_file.jpg (file list can be displayed on directory access)

Downloaded files and content.json stored in data/static/[download_date]/{Directory name} directory.

Each files in the directory also accessible using http://127.0.0.1:43110/f/602b8a1e5f3fd9ab65325c72eb4c3ced1227f72ba855bef0699e745cecec2754/any_file.jpg

As optimization if the files accessed using a directory reference the peer list can be fetched using findHashId/getHashId from other peers without accessing the trackers.

Possible problems

Too many tracker requests

Announcing and keep track of peers for large amount (10k+) of files can be problematic.

Solution #1

Send tracker request only for large (10MB+) files. To get peer list for smaller files we use the current, getHashfield / findHashId solution.

Cons:

It could be hard/impossible to find peers to small files if you are not connected to a site where that file is popular.
Hash collision as we use only the first 4 letter of the hash in hashfield

Solution #2

Announce all files to zero:// trackers, reduce re-announce time to eg. 4 hours (re-announce within 1 minute if new file added) (sending this amount of request to bittorrent trackers could be problematic) Don't store peers for file that you have 100% downloaded.

Request for 10k files: 32 * 10k = 320k (optimal case)

Possible optimization #1:

Change tracker communication to request client id token and only communicate hash additions / deletions until the expiry time. Token expiry time extends with every request.

Possible optimization #2:

Take some risk of hash collision and allow the tracker to specify how many character it needs from the hashes. (based on how many how many hashes it stores) Estimated request size to announce 22k files:

Full hash (32bytes): 770k
First 6 bytes (should be good until 10m hashes): 153k
First 7 bytes (should be good until 2560m hashes): 175k
First 8 bytes (should be good until 655360m hashes): 197k

Cons:

Depends on the zero:// trackers
Heavy requests, more CPU/BW load to trackers

Download all optional files / help initial seed for specific user

Downloading all optional files in a site or uploaded by a specific user won't be possible anymore: The optional files no longer will be stored in the user's content.json file files_optional node.

Solution #1

Add a files_link node to content.json that stores uploaded files in the last X days. (with sha512, ext, size, date_added nodes)

mkg20001 commented 5 years ago

@blurHY

When you always want to have compatibility, there's no space left for innovations then.

You should've read my ZeroNet-JS related comment: "That way we can experiment with new things, without breaking compatibility to the "mainnet" too much"

Also, libp2p is extensible. That's my point. If it works on a small scale, it can become part of libp2p and thus everyone profits.

@imachug

See, I said that most features are available right now, and switching to IPFS will take a lot more time than just adding one or two features

I'd go with that statement as well and we should also stop making this mess bigger than it is, by reusing parts that the IPLD/libp2p projects offer us and can be easily integrated especially for future additions, since that way we can finally stop re-inventing things.

@HelloZeroNet I'd really like to hear your opinion on that as well.

It seems that while in the centralized space we currently have movements to get away from silos such as big companies and move forwards to federation, a similar movement exists in the p2p space to make different protocols partly interoperable (such as Juan Benet's vision of "the merkel forest") or build common base-frameworks (such as libp2p is doing for network-releated things) and thus make the best of all available to everyone, everywhere.

purplesyringa commented 5 years ago

More time ? Do you think the application layer is harder than infrastructure ?

Yay, some sensible discussion finally! Application layer is less difficult than infrastructure layer, but ZeroNet depends on its internals rather much, so changing the application-infrastructure bridge will take much time, and we could spend that time porting features from IPFS. We'll also stay separate from IPFS which is a feature. (i.e.: having a compatible interface is good, using the same implementation is bad. That's how competition works)

purplesyringa commented 5 years ago

Adding IPFS is troublesome, whilst using libp2p should be a lot easier (after all, that's just a very low-level protocol), and if it gives a lot more features than what we currently have, I'd go for it.

purplesyringa commented 5 years ago

@blurHY @mkg20001 You're working on a single project and working as a marketer and a developer (respectively), but it looks the opposite way round from my point of view :)

mkg20001 commented 5 years ago

@imachug Nice to hear! What about the other releated parts, though? Like re-using IPLD + BitSwap for the exchange of objects across libp2p? It would make sense to use those as well, since they already integrate pretty well with libp2p and all custom parts of ZeroNet could be added as extensions to libp2p.

blurHY commented 5 years ago

some sensible discussion finally

I've already written about it, uh, you didn't see them.

will take much time,

No, it's very easy because we have stuff like orbit-db that has already done a lot for us.

But the really hard and interesting part is stage 2, blockchain portion.

Adding IPFS is troublesome, whilst using libp2p should be a lot easier (after all, that's just a very low-level protocol), and if it gives a lot more features than what we currently have, I'd go for it.

We can use both high-level stuff and access low-level stuff

mkg20001 commented 5 years ago

@imachug About the projects: There are two projects right now. ZeroNetJS (mine) and IPZN (ours). I'm mostly referring to ZeroNet-JS, which already has a big codebase (but sadly it's a messy one as well)

blurHY commented 5 years ago

@imachug About the projects: There are two projects right now. ZeroNetJS (mine) and IPZN (ours). I'm mostly referring to ZeroNet-JS, which already has a big codebase (but sadly it's a messy one as well)

Yeah, it can be integrated in IPZN for compatibility to ZeroNet

So compatibility is not a problem

purplesyringa commented 5 years ago

I've already written about it, uh, you didn't see them.

I'm sorry in this case, but I guess I just ignored it as it was surrounded by flood.

blockchain portion

I'd think of it as a bug actually -- using blockchain where it's not required is a bad idea.

We can use both high-level stuff and access low-level stuff

But that'll take a lot of effort and I don't see what we'll get in the end clearly yet.

mkg20001 commented 5 years ago

@blurHY How will it help to add ZeroNetJS's ZeroNet-incompatible features into IPZN for compatibility with ZeroNet? Could you please explain that to me?

blurHY commented 5 years ago

using blockchain where it's not required is a bad idea.

I'll explain it tomorrow, i can't clarify what it is in a few words

purplesyringa commented 5 years ago

Nice to hear! What about the other releated parts, though? Like re-using IPLD + BitSwap for the exchange of objects across libp2p? It would make sense to use those as well, since they already integrate pretty well with libp2p and all custom parts of ZeroNet could be added as extensions to libp2p.

We'd better start with something obvious like a low-level protocol and add more stuff in the future. Also, that's the first time I've ever heard of BitSwap so I'll have to spend some time reading on that...

blurHY commented 5 years ago

How will it help to add ZeroNetJS's ZeroNet-incompatible features into IPZN for compatibility with ZeroNet? Could you please explain that to me?

Uh, if your project can be compatible to zeronet using libp2p, we can add this to IPFS as a ipfs-plugin.

add ZeroNetJS's ZeroNet-incompatible features

what's this

blurHY commented 5 years ago

the first time I've ever heard of BitSwap

Haha, you don't know that

purplesyringa commented 5 years ago

Uh, if your project can be compatible to zeronet using libp2p, we can add this to IPFS as a ipfs-plugin.

I believe it makes a lot more sense to make IPFS a plugin, not the other way round...

mkg20001 commented 5 years ago

@imachug BitSwap is part of IPFS's way of exchanging IPLD data over the network. Once it's implemented in python as well, it might be useful, since that reduces the maintenance burden for @HelloZeroNet . It's explained in the paper that describes other parts of IPFS as well, such as it's dag.

purplesyringa commented 5 years ago

Haha, you don't know that

Haha, you don't know what abjasdgljkhgjklhsdfljkgh is.

blurHY commented 5 years ago

to make IPFS a plugin

Anyways ZeroNet is not well designed.

So we should focus on its successor

mkg20001 commented 5 years ago

Haha, you don't know that

@blurHY Welcome to real-life, where not everyone knows everything. Put your schadenfreude somewhere else. Don't know what that means? Ha... no, really, stop it...

blurHY commented 5 years ago

Haha, you don't know that

@blurHY Welcome to real-life, where not everyone knows everything. Put your schadenfreude somewhere else. Don't know what that means? Ha... no, really, stop it...

I have 20 GB of offline dictionaries ...... by simpling ctrl+c ctrl+alt+c

mkg20001 commented 5 years ago

Anyways ZeroNet is not well designed.

So we should focus on its successor

@blurHY That's not how it works. Only a gradual transition is (in most cases) even feasible. And you're not going to suddenly find your way around that!

purplesyringa commented 5 years ago

Anyways ZeroNet is not well designed.

Prove that. Adding features to ZeroNet is easy, you can't call that bad design.

blurHY commented 5 years ago

you can't call that bad design.

IPLD's design is much better

purplesyringa commented 5 years ago

IPLD's design is much better

"No u".

mkg20001 commented 5 years ago

I have 20 GB of offline dictionaries ......

@blurHY

That says what exactly? That you're good at storing data? I have internet access as well, just fyi.

Look, at that point it doesn't even look like you want to argue, you just want to troll. Better be quiet then to throw out such meaningless nonsense.

blurHY commented 5 years ago

@imachug Because there's a known better solution/design than ZeroNet

blurHY commented 5 years ago

I have internet access as well,

That's fast

PS: I found now this issue is almost the most commented one on ZeroNet

purplesyringa commented 5 years ago

Because there's a known better solution/design than ZeroNet

Which. One? After you find it, prove that it's better.

mkg20001 commented 5 years ago

Because there's a known better solution/design than ZeroNet

Then why are you even bothering to argue about compatibility with ZeroNet. Go ahead, do your own thing, I doubt it will be more compatible than ZeroNet currently is.

Why are you even putting your ego so much into this? I've seen the world, and there are times when X is better than Y and then there are times where it's the opposite way.

Sometimes rust is better tool to make a thing, sometimes it's javascript. Sometimes it's both. Sometimes it's none. The ends justify the means. What's your end for IPZN? Unity of all protocols? Or dominance/betterness above others?

blurHY commented 5 years ago

Because there's a known better solution/design than ZeroNet

Which. One? After you find it, prove that it's better.

IPFS

It's basically impossible to explain in a few sentences, you should learn yourselves

purplesyringa commented 5 years ago

It's basically impossible to explain in a few sentences, you should learn yourselves

In this case, go write your own ZeroNet-like network on top of IPFS. And don't argue with "I can't do that myself", nofish made ZeroNet without anyone's help.

mkg20001 commented 5 years ago

@blurHY

To continue my comment above:

If unity is on your mind, it would only be reasonable to continue supporting all others as well, to keep them for the specific usecases where they are supiriour.

But one-size-fits-all has been the biggest joke of all times.

There are moments where IPFS is good, there are moments when something as plain and simple as "SSH to my server and download that file" is the better one.

Global, local, private, there are many contexts and what matters in each is a different story. KBFS is good for small teams but horrible for public stuff, IPFS is great for publicizing information bad for privatizing, tor is good for anonymity but horrible for speed

So, with that said, what's your standing point in this debate.

purplesyringa commented 5 years ago

Do you understand how nonsense it is to tell Facebook guys that React is awful and they should switch to Vue? That's about the same. These are just two different things, and, while they're designed for about the same, they're different; sometimes React is better, sometimes Vue is.

blurHY commented 5 years ago

If unity is on your mind, it would only be reasonable to continue supporting all others as well,

For example, Dat will not be supported, because it overlaps with IPFS and IPFS has more features.

IPFS is currenly the best solution, so I won't abandon to use some features that only exist on IPFS for compatibility

blurHY commented 5 years ago

That's about the same

Here that's not the same

mkg20001 commented 5 years ago

So, with that said, what's your standing point in this debate?

@blurHY I want this specific and meta question answered. And nothing more. All else was just context.

purplesyringa commented 5 years ago

Here that's not the same

"IPFS is better! Go learn why yourself! I'm not going to spend my time discussing that!" Ok, we're not going to spend our time supporting your project. Bye.

blurHY commented 5 years ago

So, with that said, what's your standing point in this debate?

@blurHY I want this specific and meta question answered. And nothing more. All else was just context.

ZeroNet is a toy project

mkg20001 commented 5 years ago

ZeroNet is a toy project

@blurHY Says who? It's being actively used. And if you're using linux, then congrats on using the worlds biggest toy project, if that is your point. Past intent may not always equal future intent.

If you can't answer it properly or you still think your arguments are the ones always being superior, then feel free to leave. I'll wait.

You have two users trying to explain to you why you're not the one that is always right, and neither are we.

I agreed with @imachug for example that BitSwap isn't easy to implemented and it might sense to post-pone it until we have libp2p added.

But no. You just continue spewing nonsense.

purplesyringa commented 5 years ago

ZeroNet is a toy project

Remember how Apple started.

blurHY commented 5 years ago

Discuss it tomorrow

mkg20001 commented 5 years ago

@blurHY All I wanted to get you to understand, was that hammering screws and screwing nails is both equally stupid and that's why there are different tools, that work for different problems. Protocols like tor have pros and cons, as well as ZeroNet and IPFS.

But you claim a hammer to be the worlds best tool and everyone else should be hammered down in your opinion.

That's not how you get support. That's how you get hate.

Life is about making the life's of others better, such that your own life improves within the process. And not commanding others to do it for you or forcing their beliefs upon them.

blurHY commented 5 years ago

IPZN and ZeroNet are not different tools.

purplesyringa commented 5 years ago

Ok, so it looks like content-addressed data is what's better in IPFS (or, at least, that's what you think).

Question 1: how to serve dynamic sites? IPNS sounds like the only option. Same with user-content.

Question 2: this proposal looks rather neat and should fix centralization problems caused by site-based architecture. It looks like we could do the following:

Allow content-addressed files;
Allow content-addressed directories (with a merkle tree);
(this proposal references the above only)
Allow content.json's (or similar) for content-addressed directories to control who can post user content;
IPNS alternative for directory addresses.

By this time, we basically just rebuilt ZeroNet and returned to hub-based architecture. Now, if that's not what you're looking for, where did I miss your point?

HelloZeroNet commented 5 years ago

The topic starter proposal hashes individual files and not directories and not a replacement for current, site based storage.

Use cases: User file uploads, media files on sites

blurHY commented 5 years ago

how to serve dynamic sites?

Hmm, read the wiki carefully please https://gitlab.com/ipzn/ipzn/wikis/home#ipfs

Any decentralized web can be summarized as two portions, a global and decentralized file system, and a communicator.

purplesyringa commented 5 years ago

and not directories

I'd recommend that one though. It's rather common to group related files to a single directory (or even to nested directories). I'd really love that feature.

purplesyringa commented 5 years ago

@blurHY Right, but that doesn't answer my question: if we have files, and file trees are signed by user/site/hub/whatever keys, how's that different from ZeroNet?

blurHY commented 5 years ago

how's that different from ZeroNet?

There's no difference between user content and site content naturally.

We are just gathering related data.

purplesyringa commented 5 years ago

Ok, so, basically, do you just want user data to be independent of site owner? In this case, this change should be rather easy.

HelloZeroNet / ZeroNet

Proposal: Content addressed data #2192

Content addressed data access

Why?

What?

How?

File storage

Url access

File upload

File download process

Directory upload

Possible problems

Too many tracker requests

Solution #1

Solution #2

Possible optimization #1:

Possible optimization #2:

Download all optional files / help initial seed for specific user

Solution #1

We can use both high-level stuff and access low-level stuff