Open go-sean-go opened 1 year ago
Maybe it is the same problem, right? https://github.com/algolia/firestore-algolia-search/issues/171 In the v1.1.1 it was working properly
Could be the same root cause/fix, but the goal is different: this issue/request deals with overlapping/colliding document IDs, and #171 is dealing with overlapping/colliding collection names.
So I would say it is a different thing; this is also not a bug, but a feature request.
This feature is quite important, no feedback yet?
hello @go-sean-go I have created a RC of the extension that allows you to change the Object in the configuration. Please try it out and let me know if it solves your problem. https://console.firebase.google.com/project/_/extensions/install?ref=algolia/firestore-algolia-search@1.2.1-rc.0
@smomin Can you confirm how to use the field?
Are the valid values either (a) (path)
or (b) a document field/property name (e.g. authorId
or something)?
In the case of option (b), must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?
@go-sean-go
The valid values are below:
(path)
use the document path
authorId
it will get the document attribute
thanks @smomin - regarding my other question using e.g. the authorId
:
must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?
Basically, if the authorId
value changes, what happens? Does it change the object ID at Algolia? Or does it only consider the value on initial sync and leave it alone afterward? Or, would it be naive of the Algolia state and simply create a new document? (Would that leave the old one orphaned?)
I'm trying to consider the practical use case for the property value option.
Separately, a question on the (path)
option: what are the limits here on the Algolia side? I ask because Firestore's limits on a path are rather extreme (per their docs): paths may have up to 100 segments, each with IDs up to 1,500 bytes, and document names can be up to 6kb, etc. Meaning these paths could be thousands of characters. I imagine Algolia will choke on that? But maybe not - maybe a 1,000 character Algolia ID is fine...? But probably not preferred.
Considering these scenarios, if I might suggest a simpler feature to solve the original issue + avoid these scenarios (which would likely be unintended from the user side): perhaps we should simply offer a simple checkbox option that hashes the full path (VERY LONG STRING) to a standard UUID-length string. This would provide uniqueness, idempotency, and I believe handle even extreme edges of the Firestore quotas - if I'm reading it right & thinking about it right.
Anyway, just my two cents. Let me know about the above.
hey @go-sean-go sorry for missing this but are you concerns still valid? Let me know you feedback on the RC release.
I haven't re-tested since my original comment - so I'm not sure. If no changes have been made to the feature/code, then yeah, my questions would still be outstanding.
Overall, per my comments above, I don't think the current solution is very durable or clear. The hash mechanism I suggested above would be something to explore (not my area of expertise), I believe - as long as it has a sufficiently large capacity.
To repeat my concerns:
(path)
approach is flawed because Firestore has very extreme limits on # of subcollections and so on (meaning: someone using Firestore in a valid/supported way will probably exceed Algolia's object ID limits, I'm guessing?).
(see Question at the end)
I have sync'd an index between Firestore and Algolia. In my case it's a subcollection of feed items, such that when
User1
creates a post (let's call itPost1
), ifUser2
andUser3
followUser1
, they both get a copy ofPost1
in a subcollection.Here is the data structure, then, after creation/replication:
You may already see the problem here: I have 3 documents with the same document ID (
Post1
).In Firestore land, this is totally fine; typical use cases generally query subcollections with this sort of syntax:
query.collection(users/${userId}/feed).get()
- in other words, you've already automatically filtered the returned feed items to prevent duplicate Ids.In Algolia, it appears that with this configuration, the documents are simply re-written many times, and there is only one copy of
Post1
in the end.Question: perhaps this can be solved with a
Transform Function Name
function? If so, how would I change the document Id...? I can experiment with this next week, but the documentation here is very light, so I don't want to sink too much time into it if the maintainers can simply speak to it here.