ProtoSchool / protoschool.github.io

The code that runs the ProtoSchool website. Visit https://proto.school for interactive tutorials on decentralized web protocols. Explore IPFS and Filecoin through code challenges, code-free lessons, and local events.
https://proto.school
Other
165 stars 67 forks source link

Lesson Feedback%3A Content Addressing - Lesson 3 (The decentralized web: Content addressing) #665

Open ericrpatton opened 3 years ago

ericrpatton commented 3 years ago

Have a question or suggestion regarding a specific ProtoSchool lesson? Please use this template to share it!

URL of the lesson that's confusing: https://proto.school/content-addressing/03

What's confusing about this lesson? The text in this lesson states:

"When we want a specific photo of an adorable pet, we ask for it by its content address (hash). Who do we ask? The whole network! If Ada is online, we'll see that she has the content we're looking for, and we'll know that it's exactly the file we need because it has a matching hash."

What if I don't know that hash of the data I am searching for? This is my big problem with IPFS, that it presumes we already know the CID of what we're looking for. 99.9999% of the time, when I search for something on Duckduckgo, I have no idea what the properties of the data I am looking for have, particularly it's hash. How an I supposed to find anything on IPFS? The documents never seem to address this.

What additional context could we provide to help you succeed? Tell me how I can find data from anonymous strangers without already knowing the hash or CID of the data I need in advance. This seems so basic.

What other feedback would you like to share about ProtoSchool?

terichadbourne commented 3 years ago

Thanks for the great question, @ericrpatton! I'm still pretty new to the space myself, but I'll give this a shot...

You're right that there's no way to say directly to the IPFS network, "Find me all the pictures of cute cats!" However, there actually isn't anything like that directly in HTTP (the equivalent web protocol we're more accustomed to) either. Instead, we rely on a bunch of indexing tools that have been built up in the ecosystem.

To explore this question we have to think of the data (image, document, etc.) separately from the index of that data. On the centralized web, when I publish a website, I include metadata such as a title, a language, an image for a social card, etc. Search engines like Google or DuckDuckGo are looking at that metadata rather than the content itself when they crawl sites and build indexes.

Interestingly, when you use DuckDuckGo, you may already being finding sites like ProtoSchool that are hosted on an IPFS gateway right alongside the sites hosted in more traditional ways.

One of the clues that search engines can use when indexing content is the URL strings themselves. On the traditional web, this might be something like host.site/images/animals/dog.jpg. As you've seen in this tutorial, there's no guarantee that there's actually a photo of a dog at that address, but it still feels like a relevant clue. While IPFS CIDs usually look like a string of gobbledegook and therefore don't include such clues, you can actually create a directory structure that uses filenames if you use the Mutable File System. When you do that, you can refer to the root of the directory by its CID but then show the rest as directory and filenames, like so: /Qm23058...234j0kmlk3/images/animals/dog.jpg. Check out our Mutable File System tutorial to learn more.

As the decentralized web ecosystem continues to develop, we're likely to see more indexing tools emerge, but the metadata for this indexing would continue to live separately from from data itself. For example, any data stored on a public blockchain can be indexed by search engines, and many folks who create NFTs store the art (or other data) itself on IPFS while storing the metadata for the deal on the blockchain. (Apologies if this example is too obscure to be helpful. 😂 )

Hope at least some of this is helpful! Would love to hear what here clicks and what's still confusing so I can get better at explaining it.