Closed rayanoncyber closed 2 years ago
Hi @rayanoncyber, where are you getting this error? Do you have a stack trace or similar? I don't recognise this error from this code, so I wonder if it's maybe in Firebase itself?
This is what I get about 5 minutes after running the command
Thanks @Haroenv
Not too sure what can cause this, but it is implied the individual connection is too "old", and it should be processed in batches (eg. https://stackoverflow.com/questions/64712587/google-cloud-pub-sub-function-gives-the-requested-snapshot-version-is-too-old, https://github.com/firebase/functions-samples/issues/890, https://github.com/firebase/firebase-js-sdk/blob/5ad7ff2ae955c297556223e6cb3ad9d4b897f664/packages/firestore/src/remote/rpc_error.ts#L76)
The error came from the firestore-algolia-search module trying to fetch all documents at once (with big datasets it fails).
I re-wrote retrieveDataFromFirestore to take care of it in sequences through a recursive function. Hope it can help some people with the same problem
const retrieveChunk = async (lastVisible, maxLength) => {
const collectionPathParts = config_1.default.collectionPath.split('/');
const collectionPath = collectionPathParts[collectionPathParts.length - 1];
let querySnapshot;
if (lastVisible) {
querySnapshot = await database.collection(collectionPath).limit(maxLength).startAfter(lastVisible).get();
}
else {
querySnapshot = await database.collection(collectionPath).limit(maxLength).get();
}
processQuery(querySnapshot).catch(console.error);
return [querySnapshot.docs[querySnapshot.docs.length - 1], querySnapshot.docs.length];
};
const retrieveDataFromFirestore = async (lastVisible=null) => {
const maxLength = 100;
const [keepGoing, length] = await retrieveChunk(lastVisible, maxLength);
console.log("LENGTH", length);
if (length == maxLength) retrieveDataFromFirestore(keepGoing);
};
This is very useful, thanks! If you are confident in this code, feel free to make a pull request :)
@rayanoncyber did you want to send a PR?
@rayanoncyber did you want to send a PR?
It would have been more practical to implement it instead of closing ticket as it makes migration tool useless for bigger datasets.
someone was make this change for support large collections.
if (lastVisible) { querySnapshot = await database.collection(collectionPath).limit(maxLength).startAfter(lastVisible).get(); } else { querySnapshot = await database.collection(collectionPath).limit(maxLength).get(); }
Curious if this approach (using .get()
) can guarantee the correct order of execution... Usually, chunks go together with sorting.
Based on the answers above, I end up with a small wrapper function that helped me to go through all Firestore records:
async *iterateAll<T extends DocumentData>(
collection: CollectionReference<T>,
orderBy: Extract<keyof T, string>,
direction: OrderByDirection = 'desc',
batchSize: number = 2000,
): AsyncGenerator<QueryDocumentSnapshot<T>> {
let offset = 0;
let shouldContinue = true;
do {
const query: Query<T> = collection.orderBy(orderBy, direction);
const querySnapshot: QuerySnapshot<T> = await query.limit(batchSize).offset(offset).get();
for (const doc of querySnapshot.docs) {
offset++;
yield doc;
}
if (querySnapshot.docs.length < batchSize) {
shouldContinue = false;
}
} while (shouldContinue);
}
and used it the following way:
for await (const doc of this.iterateAll<IOrder>(firestore.collection('orders'), 'createdAt')) {
if (!doc.exists) {
continue;
}
const order = doc.data();
// handle every order here
}
Hope that helps someone 😉
P.S. instead of createdAt
you can use any field that exists in all records and gives persistent ordering
Hey!
We're trying to migrate our firestore data to Algolia (~5 million records) and we keep facing this issue. Seems like it's related to firestore failing to fetch that much data. Any known or upcoming fix to batch fetch?
Thanks a lot!