Closed alvaroloes closed 4 years ago
Short answer:
Long answer:
You've done a great job understanding Noms from somewhat poor documentation. Nice.
1, 2, and 4 are all basically equivalent from an efficiency perspective. They are the same as "full table scans" in a relational database. 4 is barely implemented, so I don't recommend it.
3 would be the right way to do it more efficiently than a full scan. Within 3, you are also correct that there are several options:
Map<K, Person>
- This is equivalent to a covered index. It is the fastest because it does the fewest reads, but uses the most space because the data is copied into the index. Note that if Person
is a large struct, only about 4kb of it on average per-value will be copied into the index. The rest will be shared with other instances of the same Person elsewhere in the db automatically.Map<K, PersonID>
- Assuming that you have another map somewhere like Map<PersonID, Person>
, this is analogous to a traditional secondary index in a database. It uses only a little extra space. "Join" can also be efficiently implemented so that this will end up only do a small amount of extra reads.Map<K, Ref<Person>>
- Assuming you also have another map somewhere like Map<PersonID, Ref<Person>>
this is most space efficient but slowest because every person fetched from either the index or primary table will be its own read.Thank you very much for answering so quickly and in detail.
I don't know how I missed the nomdex
example! It is exactly what I was looking for before submitting the issue.
So here are some more questions:
As shown in the nomdex
example, manual indexes cover searches and I can infer that also basic sorting and pagination. However, it seems that, for example, to get the "first 10 people called Ted" I would need to fetch all the people called Ted (which can be a lot) and then take the first 10 at the application level.
Auto-suggestion: Maybe I could avoid fetching all the records and get just the amount I want by doing the following:
IterAt
or IterRange
checking again the criteria in the callback function and stopping when I have the amount I wanted.Is there any plan to implement a query system that supports at least filtering, sorting and pagination?
Is there any plan to add automatic creation/maintenance for indexes? From your previous comment "Noms doesn't have automatic index maintenance yet", it seems so. If this is correct, any idea of when it would be available?
Don't get me wrong, I don't want to put any kind of pressure. I just love the concept and the project and want to have some idea of the status and the roadmap. It would be very useful for several projects in my company too.
I'm a developer and am willing to book some spare time to help with any of the above features 😉
As shown in the nomdex example, manual indexes cover searches and I can infer that also basic sorting and pagination. However, it seems that, for example, to get the "first 10 persons called Tedd" I would need to fetch all the persons called Tedd (which can be a lot) and then take the first 10 at the application level. Auto-suggestion: Maybe I could avoid fetching all the records and get just the amount I want by doing the following: Search the index (which is already sorted by name) and get the first person matching the criteria Use IterAt or IterRange checking again the criteria in the callback function and stopping when I have the amount I wanted.
Yep this is how you would do it. (this is exactly how it would be implemented by a query system too).
Is there any plan to implement a query system that supports at least filtering, sorting and pagination?
No, just vague ideas.
Is there any plan to add automatic creation/maintenance for indexes? From your previous comment "Noms doesn't have automatic index maintenance yet", it seems so. If this is correct, any idea of when it would be available?
Again, not really.
Typically applications that use Noms handle this at the application layer because their data model is much more restricted than Noms' so it's quite a lot easier.
I'm open to it, I just don't know what the general design looks like right now.
Can you say more about your use case and what you're looking for? Perhaps I can help you build something specific for that use case, and it can also feed into what a general system might look like.
Thank you!
Can you say more about your use case and what you're looking for? Perhaps I can help you build something specifically for that use case, and it can also feed into what a general system might look like.
We are creating a project that follows an event-driven architecture similar to CQRS. There are many parts in the system, but one of them is the "events store": A record of every fact that has happened in the system.
When those facts occur, they are emitted as events (e.g. "user-created") and are stored in the "events store". It acts as the real "source of truth" and must be event-sourced. What I mean with this is that we should be able to traverse the flow of events and stop at any moment in time to know the status of the system at that moment.
There are several approaches to achieve this, but Noms seems like a good fit in this scenario (that's what I want to confirm) as we can go back to any previous commit and already have the system status ready (instead of "reducing" all the events from the latest snapshot to the moment we are interested in).
One more question: Which is the kind of projects where Noms naturally fits as it is right now? I'm especially interested in the use-cases you had in mind that motivated the creation of Noms.
Thanks for being so responsive!
There are several approaches to achieve this, but Noms seems like a good fit in this scenario (that's what I want to confirm) as we can go back to any previous commit and already have the system status ready (instead of "reducing" all the events from the latest snapshot to the moment we are interested in).
Exactly.
Which is the kind of projects where Noms naturally fits as it is right now? I'm especially interested in the use-cases you had in mind that motivated the creation of Noms.
Noms was originally intended to support a particular application which never saw the light of day (something sort of similar to Perkeep).
I've lately come to learn about this 'event sourcing' pattern, and I do think there are some natural places for Noms to fit there.
Hi,
I've been reading all the Noms documentation available because it could fit for a project in my company. I love the concept and I think it has huge potential. So far I have confirmed all the characteristics we need except one: querying and filtering list, maps and sets.
Imagine I have a list of persons stored in Noms. The person data would be:
Let's say I have a dataset called
persons
which contains a list of a million persons.How would I find "All women with an age between 20 and 30 called María"?
Here are my guesses with my concerns. Could you please let me know which one would be the most suitable approach?:
Use the
iterAll
method with an anonymous function that will check if the current person matches the query and then store it in a "results" slice. Concern: This is pretty inefficient as I'm traversing the whole List element by element.Load the list into my Go application memory and do the search more efficiently. I could index the results as I load the list and, eventually, do a binary search. Concern: I would say this option is even less efficient than the first one and doesn't feel right.
Store several maps in the dataset that serve as indexes for the list. I could store maps that index the Persons by age, by name or both. Concern: This would effectively increase the efficiency, as it is the same thing an index in other DB engines does, but:
List<Ref<Person>>
as map value)?Use Ngql. It seems that there is support of GraphQL to query data in a Noms database, but I'm a bit confused about this.
Could you please point me in the right direction? Thank you very much!