Closed ORBAT closed 5 years ago
@ORBAT - I'm really sorry about the slow reply here. I completely missed the notification about this PR due to email config. Should be fixed now.
as it currently stands, cmd/noms can't interact with IPFS stores unless it's in the main noms repo.
eep. good point.
The main reason to pull out the IPFS code was to work around some dependency issues we have experienced with it (e.g., https://github.com/attic-labs/noms/issues/3786).
I'm not sure what the right solution to that is, but on future reflection, if people are going to be actively working with the IPFS support (e.g., you), then we might as well have the code in this tree for easier maintainability rather than separate.
So I support moving this back in.
That said, I don't think that it should use the 'external protocol' concept if it is linked into the binary.
Hi, thanks for the reply and the feedback; no worries about how long it took.
I'll check the current IPFS dependencies and see if #3786 is still an issue. Getting rid of gx dependencies may or may not be an option, dunno yet.
Hi @aboodman, I've got a couple of questions (with likely more incoming since I'm trying to learn the ropes of noms internals, so please bear with me.)
Blockstore
/ Blocks
Regarding the reasoning behind having ipfs-local
use Blockstore
, instead of creating an offline node with core.BuildCfg{Online: false}
. Was the intention to have the node always be online so the pubsub subsystem will still work regardless of whether the user wants the actual blocks to be exposed to the wider IPFS network?
Do blocks stored directly via the Blockstore
API never get exposed to the network? To me it seems like the BlockService
always gets initialized with a reference to the Blockstore
, and the the Exchange
is networked if cfg.Offline
is true. So if I've understood this correctly, adding blocks via Blockstore
does expose them to the network; it's just that the current IPFS chunkstore code limits the its ability to get blocks from the network by only using Blockstore
in "local mode."
The reason I'm asking this is that it'd be neat to have just one protocol, ipfs
, for both the local and networked use cases, with the configuration defining whether to enable networking or not. As far as I can tell, this'd remove the need for the different code paths for local vs networked chunk stores (if cs.c.local { ... } else { ... }
). Of course, having a non-networked node would mean the pubsub system wouldn't work, so that'd require some refactoring of the IPFS example.
What was the reasoning behind limiting concurrency to 1 in the chat example? Is it mainly to limit the number of concurrent readers/writers of the root hash file? Or is the Blockstore
not thread-safe? I haven't looked into that yet, and IPFS's docs didn't seem to mention anything about thread-safety at a quick glance.
Rebase
I'm a bit unclear on the concept of root and Rebase
.
I sort of understand root's meaning for datasets and databases, but what's its meaning in the chunkstore context? ** The database root is Map<String, Ref<Commit>>
pointing to datasets, but if I've understood correctly, multiple databases can be backed by the same chunkstore. Does a chunkstore root point to 1+ database roots?
What is Rebase
used for, in both the database and chunkstore contexts? As in, in what sort of situations would one need to do a rebase? I've seen it used in e.g. the memory stores with views, which is understandable (brings the view in line with the "top level" memory store), but I'm not clear on its use in other cases.
@ORBAT - I'm so sorry (again).
I'm not sure why I keep missing these mails. If I'm unresponsive, can you please just ping the thread again. I'm trying to spend more time on Noms the last few weeks, and I'd love to have contributions. I think right now the traffic is just low enough that it's easy to miss in my inbox.
You can also just shake my tree at any other channel (e.g., aboodman on Twitter). But hopefully now that I'm spending time on this again, it won't continue to happen.
Regarding your questions:
more incoming since I'm trying to learn the ropes of noms internals
Awesome.
Regarding the reasoning behind having
ipfs-local
useBlockstore
, instead of creating an offline node withcore.BuildCfg{Online: false}
. Was the intention to have the node always be online so the pubsub subsystem will still work regardless of whether the user wants the actual blocks to be exposed to the wider IPFS network?
It's hard to remember, and it's possible we didn't even explore the Online
flag. But it's true that we would have wanted the pubsub system to work in both modes.
The ipfs
/ipfs-local
distinction was a bit of a quick hack and with more time I hope we could do something better. IIRC (and my memory is very foggy) the issue was that Noms has a deep assumption that a dag is always complete. Meaning that if a particular chunk is present, then all chunks reachable from it will also be present.
Because of the magic of the IPFS network, with ipfs
we would frequently end up with chunks present locally but not their children. This was bad news for Noms.
I don't think we ever really got the ipfs
scheme working well.
Do blocks stored directly via the
Blockstore
API never get exposed to the network? To me it seems like theBlockService
always gets initialized with a reference to theBlockstore
, and the theExchange
is networked ifcfg.Offline
is true. So if I've understood this correctly, adding blocks viaBlockstore
does expose them to the network; it's just that the current IPFS chunkstore code limits the its ability to get blocks from the network by only usingBlockstore
in "local mode."
The latter was the goal IIRC. Perhaps the special case in the put path is not necessary!
The reason I'm asking this is that it'd be neat to have just one protocol,
ipfs
, for both the local and networked use cases, with the configuration defining whether to enable networking or not. As far as I can tell, this'd remove the need for the different code paths for local vs networked chunk stores (if cs.c.local { ... } else { ... }
). Of course, having a non-networked node would mean the pubsub system wouldn't work, so that'd require some refactoring of the IPFS example.
I actually came to the conclusion over time that the ipfs
path was not very useful. If I were doing this in a principled way I would just be using pubsub and libp2p directly, and then having them talk to a normal nbs database.
Concurrency
What was the reasoning behind limiting concurrency to 1 in the chat example? Is it mainly to limit the number of concurrent readers/writers of the root hash file? Or is the
Blockstore
not thread-safe? I haven't looked into that yet, and IPFS's docs didn't seem to mention anything about thread-safety at a quick glance.
No idea! Can you give me a pointer to the code in question?
Chunkstore roots /
Rebase
I'm a bit unclear on the concept of root and
Rebase
.
I'm so sad I didn't catch this question faster. Cool to have someone digging in.
I sort of understand root's meaning for datasets and databases, but what's its meaning in the chunkstore context? ** The database root is
Map<String, Ref<Commit>>
pointing to datasets, but if I've understood correctly, multiple databases can be backed by the same chunkstore. Does a chunkstore root point to 1+ database roots?
It's not the case that multiple databases can be backed by one ChunkStore.
The difference between ChunkStore
and Database
is basically that Database
knows about types.Value
and ChunkStore
doesn't.
So when you write something to Database
the thing you write is a types.Value
. Database
does all kinds of validation to ensure the system at the end of the day still looks like a valid database. It makes sure that when you commit a value, all the values it depends on are present.
ChunkStore
doesn't do any of that. You can, erm, put chunks in it, and get them out. The only other thing of significance that you can do with it is read/write to a single unnamed mutable register. Writes to this register have certain concurrency guarantees on which the entire system depends.
Basically if you think of a stack of components, with your code at the top, and the OS at the bottom, Noms is in the middle. The ChunkStore
interface is the boundary between the OS and Noms. It represents the minimum requirements Noms has on the underlying storage system in order to function properly.
Database
is the boundary between Noms and you. So whereas ChunkStore's
design goal is to be absolutely minimal, Database
's design goal is to be high-level, useful, ergonomic, etc.
What is
Rebase
used for, in both the database and chunkstore contexts? As in, in what sort of situations would one need to do a rebase? I've seen it used in e.g. the memory stores with views, which is understandable (brings the view in line with the "top level" memory store), but I'm not clear on its use in other cases.
Rebase()
is useful occasionally when you know that the underlying storage may have changed and you want to bring yourself up to date with it.
Think of like a webserver that is servicing requests. If this server caches a single database instance across all requests (which would be a reasonable thing to do) then the server would want to rebase the db before each request to ensure it is seeing the latest state of the world.
The reason it is called Rebase
is because like git rebase
it brings along its current state. So for example if you do Put
a few times, then all Rebase
, then call Commit
, the chunks that you put will still be visible to commit. Whereas if you just dropped the db and built a new one, you'd lose that state. I think there is also some cached state that comes along for the ride too, which makes Rebase
more efficient that simply building a new database instance, but can't recall the details.
===
HTH, and hope you're still interested.
Closing this for now since I've moved onto other projects
This PR reintroduces the IPFS chunkstore. It also makes spec's external protocol registration thread-safe.
Commit 217ba38b by @aboodman stated that
but as it currently stands, cmd/noms can't interact with IPFS stores unless it's in the main noms repo.
Some notes regarding this PR:
d/try.go
after usinggithub.com/pkg/errors
. What might actually be worth while is refactoring try.go to use pkg/errors (see e.g. this blog post) as getting textual information about error context is extremely handyChunkStoreTestSuite
should probably be refactored so that it can be used for testing the IPFS chunkstore as well. I actually tried doing that (including adding aFactory
for the IPFS chunkstore) but tests kept freezing so I scrapped itSetPortIdx
option should probably be complemented with separate options for setting the API/gateway/swarm ports (although only the swarm port seemed to be open when eg. running ipfs-chat)