Open collinhundley opened 2 weeks ago
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
I'm not working at Google, but I was a contributor of some of this functionality, so I have some input. :-)
The encoder and decoder shipped with Firebase is based on the open source implementation that was part of Swift until it was recently replaced by a much more performant implementation.
The 'old' Swift implementation of JSONDecoder started out with Data
, then used (an Objective-C implementation of) JSONSerialization
to convert the Data
into a structure built from Dictionaries and Arrays of data, representing the JSON structure. After this, the structure was decoded into the actual Decodable
types.
Since both Firestore and the Realtime Database have similar structural representations (nested dictionaries and arrays), the decoder in firebase is basically a copy of what Swift used to do, but it skipped the initial conversion from Data
to a structure performed by JSONSerialization
.
So this is basically what it is. There are also some details in that the structural representation in Firestore can contain types that are not from JSON - like the built-in Timestamp
and GeoPoint
, but all in all it's just a copy of what the Swift Foundation JSONDecoder
used to do.
Since then, JSONSerialization
was re-written - and JSONDecoder
too. JSONSerialization
has an internal mode to not convert into Dictionaries and Arrays, but represent the parsed data as an enum where each case represents a JSON value type.
This representation is then used in the re-implemented JSONDecoder
and the stronger typing is probably part of why it's so much faster.
So - that's the state of the current decoder - and perhaps partly an explanation about why the built-in JSONDecoder
is faster.
Would it be possible to use the newer Foundation implementation in Firebase too?
The enum representation of the JSON structure is entirely internal to Foundation
, so it can't directly be used. But it could of course be cloned.
The issue would then be to get Firestore to fetch snapshots in this optimized structure rather than as structures of Dictionaries and Arrays. I haven't looked at the Firestore code enough to be able to judge if that is possible.
There is also the question of the 'native' Firestore types that are not JSON-compatible.
With regard to your question of parallelization, there's nothing in the actual decoder that does any form of synchronization. So I'm guessing that it's actually the accessing of the value on a Firestore document snapshot that does some synchronization. One guess would be that it fetches data from a shared caching layer, and access to that could require the synchronization?
You could try instrumenting your sample code above to see where most of the time is being spent in both the serial and parallel examples - perhaps there are some low hanging fruits?
All of this is just my 2 cents. :-)
@collinhundley, I'm marking this as a feature request for a more performant Firestore.Decoder. We will continue to listen for community feedback to help us prioritize the effort.
In the meantime, your approach of reducing the time to decode seems reasonable. Are you able to instrument your code as @mortenbekditlevsen suggested, or even add some log statements simple logging to confirm that your tasks are indeed running in parallel? Are you running these tests on a simulator or on an actual device?
@mortenbekditlevsen thank you for the detailed response.
With regards to parallelization, I did some further testing which leads me to believe that document access is not the bottleneck here. We can remove document access from the equation by converting all documents to dictionaries up front:
let dictStart = CACurrentMediaTime()
let dictionaries = documents.map{$0.data()}
print(CACurrentMediaTime() - dictStart) // 0.75s
let chunkedDictionaries = dictionaries.chunked(into: 5000) // [[[String : Any]]]
let decodeStart = CACurrentMediaTime()
let items = await withTaskGroup(of: [Item].self) { group in
for chunk in chunks {
group.addTask {
let decoder = Firestore.Decoder()
return chunk.compactMap{try? decoder.decode(Item.self, from: $0)}
}
}
var items = [Item]()
for await result in group {
items.append(contentsOf: result)
}
return items
}
print(CACurrentMediaTime() - decodeStart) // 10.9s
So 94% of the time is spent on actual decoding.
Profiling confirms that 10 threads are fully utilized concurrently on my M1 Max. Here's a sample from one of these threads, expanded a few levels:
100.0% 0 s closure #1 in closure #2 in DataLayer.getAllItems(for:source:)
100.0% 0 s specialized Sequence.compactMap<A>(_:)
100.0% 0 s specialized Sequence.compactMap<A>(_:)
100.0% 0 s closure #1 in closure #1 in closure #2 in DataLayer.getAllItems(for:source:)
100.0% 1.00 ms FIRFirestore.Decoder.decode<A>(_:from:)
99.9% 2.00 ms FirebaseDataDecoder.decode<A>(_:from:)
99.9% 0 s __JSONDecoder.unbox<A>(_:as:)
99.9% 0 s __JSONDecoder.unbox_(_:as:)
99.4% 0 s protocol witness for Decodable.init(from:) in conformance Item
99.4% 1.00 ms Item.__allocating_init(from:)
99.4% 2.00 ms Item.init(from:)
96.3% 0 s protocol witness for KeyedDecodingContainerProtocol.decode<A>(_:forKey:) in conformance _JSONKeyedDecodingContainer<A>
2.8% 1.00 ms protocol witness for KeyedDecodingContainerProtocol.decodeIfPresent(_:forKey:) in conformance _JSONKeyedDecodingContainer<A>
0.3% 0 s protocol witness for Decoder.container<A>(keyedBy:) in conformance __JSONDecoder
0.0% 0 s protocol witness for KeyedDecodingContainerProtocol.decodeIfPresent<A>(_:forKey:) in conformance _JSONKeyedDecodingContainer<A>
0.4% 32.00 ms static FirestorePassthroughTypes.isPassthroughType<A>(_:)
0.1% 4.00 ms dynamic_cast_existential_1_conditional
0.0% 0 s _JSONDecodingStorage.push(container:)
0.0% 2.00 ms FirebaseDataDecoder.__allocating_init()
0.0% 1.00 ms FIRFirestore.Encoder.keyEncodingStrategy.setter
I'm a little stumped - any other ideas where synchronization might be occurring?
CC @MarkDuckworth
What do you see if you print elapsed time at the beginning of each task? Or maybe print a start and stop identifier for each task?
...
for chunk in chunks {
group.addTask {
print("start decoding \(chunkIdentifier)")
print(CACurrentMediaTime() - decodeStart)
let decoder = Firestore.Decoder()
let result = chunk.compactMap{try? decoder.decode(Item.self, from: $0)}
print("stop decoding \(chunkIdentifier)")
return result
}
}
Ok, another update:
By manually testing a variety of batch sizes, I am able to squeeze out some extra performance by reducing the number of tasks to 4 (with 12,500 documents each). The decoding time is reduced to 5.2s - down from the original 9.6s single thread.
So I was wrong to say that the decoder can't be parallelized at all. However, there seems to be a lot of performance left on the table... I originally chose 10 tasks to correspond to my 10 M1 Max cores, and I know that's the limit that TaskGroup
will run concurrently on my machine.
Are we maxing out some other resource then? I can't manually tweak the batch size for every CPU out there... it would be preferable to let TaskGroup
take advantage of all available cores.
What do you see if you print elapsed time at the beginning of each task? Or maybe print a start and stop identifier for each task?
@MarkDuckworth I've done exactly this, and interestingly they all start and stop at nearly the same time:
for (i, chunk) in chunks.enumerated() {
group.addTask {
print("Start \(i): \(CACurrentMediaTime())")
let decoder = Firestore.Decoder()
let res = chunk.compactMap{try? decoder.decode(Item.self, from: $0)}
print("Finish \(i): \(CACurrentMediaTime())")
return res
}
}
Start 0: 87516.98317362502
Start 1: 87516.98320329169
Start 2: 87516.98323158335
Start 3: 87516.98324508335
Start 4: 87516.98336229169
Start 5: 87516.98339979169
Start 6: 87516.98344200003
Start 7: 87516.98347045835
Start 8: 87516.98390000002
Start 9: 87516.98391875002
Finish 6: 87527.66197037502
Finish 0: 87527.68356191668
Finish 5: 87527.68613600002
Finish 2: 87527.68647504169
Finish 1: 87527.68823379169
Finish 7: 87527.69325829169
Finish 4: 87527.69522533336
Finish 3: 87527.69923245836
Finish 9: 87527.70061004169
Finish 8: 87527.70436450002
Wouldn't you expect the efficiency cores to take a bit longer than the rest?
Description
Decoding an array of
QuerySnapshotDocument
usingFirestore.Decoder
is very slow, compared to decoding a similar array usingJSONDecoder
:What's more is that we don't gain any performance from parallelization. For example, if we split
documents
into 10 arrays of 5k documents each, and initialize oneFirestore.Decoder
for each group:...the total time is worse: about 10.3s.
So there are 2 issues:
1) Why is
Firestore.Decoder
4x slower thanJSONDecoder
? 2) DoesFirestore.Decoder
utilize some shared internal state that prevents instances from running independently on different threads?Reproducing the issue
No response
Firebase SDK Version
10.29
Xcode Version
16
Installation Method
Swift Package Manager
Firebase Product(s)
Firestore
Targeted Platforms
iOS, macOS
Relevant Log Output
No response
If using Swift Package Manager, the project's Package.resolved
Expand
Package.resolved
snippet```json Replace this line with the contents of your Package.resolved. ```
If using CocoaPods, the project's Podfile.lock
Expand
Podfile.lock
snippet```yml Replace this line with the contents of your Podfile.lock! ```