Feature request: modify document ID during sync

go-sean-go commented 1 year ago

(see Question at the end)

I have sync'd an index between Firestore and Algolia. In my case it's a subcollection of feed items, such that when User1 creates a post (let's call it Post1), if User2 and User3 follow User1, they both get a copy of Post1 in a subcollection.

Here is the data structure, then, after creation/replication:

// posts/{postId}
// users/{userId}/feed/{postId}

posts/Post1 // authorId === User1
users/User1/feed/Post1 // User1's copy of their own post
users/User2/feed/Post1 // User2's copy of the post
users/User3/feed/Post1 // User3's copy of the post

You may already see the problem here: I have 3 documents with the same document ID (Post1).

In Firestore land, this is totally fine; typical use cases generally query subcollections with this sort of syntax: query.collection(users/${userId}/feed).get() - in other words, you've already automatically filtered the returned feed items to prevent duplicate Ids.

In Algolia, it appears that with this configuration, the documents are simply re-written many times, and there is only one copy of Post1 in the end.

Question: perhaps this can be solved with a Transform Function Name function? If so, how would I change the document Id...? I can experiment with this next week, but the documentation here is very light, so I don't want to sink too much time into it if the maintainers can simply speak to it here.

maiconkf commented 1 year ago

Maybe it is the same problem, right? https://github.com/algolia/firestore-algolia-search/issues/171 In the v1.1.1 it was working properly

go-sean-go commented 1 year ago

Could be the same root cause/fix, but the goal is different: this issue/request deals with overlapping/colliding document IDs, and #171 is dealing with overlapping/colliding collection names.

So I would say it is a different thing; this is also not a bug, but a feature request.

andrewkimjoseph commented 1 year ago

This feature is quite important, no feedback yet?

smomin commented 9 months ago

hello @go-sean-go I have created a RC of the extension that allows you to change the Object in the configuration. Please try it out and let me know if it solves your problem. https://console.firebase.google.com/project/_/extensions/install?ref=algolia/firestore-algolia-search@1.2.1-rc.0

go-sean-go commented 9 months ago

@smomin Can you confirm how to use the field?

Are the valid values either (a) (path) or (b) a document field/property name (e.g. authorId or something)?

In the case of option (b), must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?

smomin commented 9 months ago

@go-sean-go

The valid values are below:

(path) use the document path authorId it will get the document attribute

go-sean-go commented 9 months ago

thanks @smomin - regarding my other question using e.g. the authorId:

must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?

Basically, if the authorId value changes, what happens? Does it change the object ID at Algolia? Or does it only consider the value on initial sync and leave it alone afterward? Or, would it be naive of the Algolia state and simply create a new document? (Would that leave the old one orphaned?)

I'm trying to consider the practical use case for the property value option.

Separately, a question on the (path) option: what are the limits here on the Algolia side? I ask because Firestore's limits on a path are rather extreme (per their docs): paths may have up to 100 segments, each with IDs up to 1,500 bytes, and document names can be up to 6kb, etc. Meaning these paths could be thousands of characters. I imagine Algolia will choke on that? But maybe not - maybe a 1,000 character Algolia ID is fine...? But probably not preferred.

Considering these scenarios, if I might suggest a simpler feature to solve the original issue + avoid these scenarios (which would likely be unintended from the user side): perhaps we should simply offer a simple checkbox option that hashes the full path (VERY LONG STRING) to a standard UUID-length string. This would provide uniqueness, idempotency, and I believe handle even extreme edges of the Firestore quotas - if I'm reading it right & thinking about it right.

Anyway, just my two cents. Let me know about the above.

smomin commented 8 months ago

hey @go-sean-go sorry for missing this but are you concerns still valid? Let me know you feedback on the RC release.

go-sean-go commented 8 months ago

I haven't re-tested since my original comment - so I'm not sure. If no changes have been made to the feature/code, then yeah, my questions would still be outstanding.

Overall, per my comments above, I don't think the current solution is very durable or clear. The hash mechanism I suggested above would be something to explore (not my area of expertise), I believe - as long as it has a sufficiently large capacity.

To repeat my concerns:

I don't actually understand the inner-workings of the property-name approach, but I have some inferred questions above.
I believe the (path) approach is flawed because Firestore has very extreme limits on # of subcollections and so on (meaning: someone using Firestore in a valid/supported way will probably exceed Algolia's object ID limits, I'm guessing?).
I think the right solution to this problem is probably to offer some idempotent mechanism to generate a new, unique document ID - or do nothing bespoke on this extension, but allow the user modify it by whatever arbitrary code (cloud function?) they like (my understanding is that the existing transform functions don't let you modify the obj ID, but that could be incorrect?).

algolia / firestore-algolia-search

Feature request: modify document ID during sync #170