Closed gedw99 closed 1 year ago
@gedw99 Interesting idea, can you elaborate?
For files larger than 16 MB it is better to use GridFS, but I don't know if it has Change Stream support.
For smaller files, however, why would we need to chunk them using the NATS object store?
Thanks for the input!
I also don’t know if GridFS supports change streams. did look at the docs and it seems to support them but until we try it not certain.
I chunking for small files under 16 mb is true..
The idea of using nats object Store is so that you can get the change stream of the file , and send it via nats. The reason is because then you can keep a cluster of mongodb nodes in sync. the other nodes get the change via nats object Store and update their own instance. I am using marmot to do this: https://github.com/maxpert/marmot
Marmot only supports SQLite currently but me and others are breaking it out to be flexible for other dbs. Marnot uses nats do do it’s thing
Hey @damianiandrea
i got this working.
There are as you suggested a few ways to approach this.
There are different fike systems I need to watch . Local, minio ( s3 ), mongo gridfs.
All 3 use the same semantics that you formalised for the Create, Update , Delete. So on nats it’s consistent. I am in 2 minds about if adding an indicator of providence is worthwhile. It would mean that the NATS payload has a field that indicated what system produced the change.
Hi @gedw99,
Thanks a lot for your feedback, as soon as I will have some time to spare I'll look into it!
If it’s not needed for your project t just say :)
I reckon I can adapt what’s here to do it but have not really looked deeply into it yet
Hi @gedw99,
I double checked a few things:
You were asking if files are supported, and they are. You can store files in a MongoDB collection, watch that collection, and its changes will be published on NATS JetStream. Same thing for GridFS, all you need to do is watch the files
and chunks
collections, this works out of the box.
What is currently not supported is to chunk and store files on NATS Object Store, but I don't think it would be necessary. If you're dealing with files larger than 16MB then you should use GridFS and let it handle the chunking/storing, chunks would have a default size of 255KB and NATS streams support messages of size up to 64MB, so I don't see any reason to use the NATS Object Store. For smaller files even more so.
Afterwards, you mentioned how you would like to watch different file systems, such as a local fs or MinIO. This would be out of scope because this is a MongoDB-NATS connector.
Let me know if you agree with my thought process or if I misunderstood something! I'm always open to suggestions :)
Thank @damianiandrea
really looks promising.
i currently chunk 20 gb files into and out of nats and it works well.
I was planning for mongodb to consume files and data off nats.
Then mongodb cdc would emit that the file changed allowing me to then tell downstream that it changed.
so Nats can then be used to keep many mongodb db ‘s in sync or to tell downstream workers about changes in mongodb and to do whatever - typically materialise data within sone custom transforms. —-
I am guessing you might be using mongodb as the first mutation layer and then emitting the cdc events out to nats, and on to downstream systems?
Hi @gedw99,
Yes you are correct. However, it's in the plans to add data sourcing from NATS to MongoDB, so I think that's what you'll need to achieve the synchronization you were talking about! :)
Thanks - yep we get each other
conduit already does all this . It’s got a nats stream in it .
https://pkg.go.dev/github.com/conduitio-labs/conduit-connector-nats-jetstream
check it out !!
also for CDC https://github.com/conduitio-labs/conduit-connector-mongo
sorry forgot to give you this one :)
Mongodb can store files and nats can chunk them using nats object store .
this would allow reacting to file changes . But also replicating them.