Open mishamsk opened 1 year ago
So when something breaks during deserialization, the current basic functionality is unusable. Even if I could have inspected the 300MB worth of text, finding what caused an error is impossible. The outer context is not helpful, on the contrary it is detrimental. A typical example - I've changed some model and trying to code a migration from the old one.
This is a very good example of why this issue should be addressed. To be honest, I've come across this problem myself.
They way I've fixed it is by using an unwrapped helper: code and doc
I really like this helper. Good job!
It is almost ideal, but there is one thing that can't be done outside of mashumaro - namely, I can't get the index of child if it is part of a sequence.
I don't see any another solution other that changing the generated sequence unpacker code to something like this:
def sequence_unpacker(seq):
result = []
for idx, el in enumerate(seq):
try:
result.append(int(el))
except Exception:
raise SomeException(seq, pos=idx)
But it will cost us a speed reduction because in the current implementation we are getting [int(el) for el in seq]
. We might introduce a new "debugability" config option for those who want sacrifice performance in favour of improved error handling.
And then either:
make what my helper does the new behavior make it a config option to let InvalidFieldValue behave in such a way for backward compatibility
I think we should start from a config option and then see how it goes. If we manage to keep sequence debugability without relying on stack trace and avoid notable impact on performance, we can make this behavior default.
I can try my hand at this if the concept seems right. Didn't want to do a PR before we discuss this
The concept is definitely right and this is a very important issue. If you need any help, let me know.
@Fatal1ty read the code, indeed adding an index seems like:
for (b), maybe this pattern would be less degrading:
def sequence_unpacker(seq):
result = []
try:
for idx, el in enumerate(seq):
result.append(int(el))
except Exception:
raise SomeException(seq, pos=idx)
but still a real impact.
I'll draft something on one of the upcoming weekends
Is your feature request related to a problem? Please describe. First, this has been discussed in #83 and dismissed. But I'd like to raise the question and propose a solution yet again.
For the context, which is the same as in #135 - I am working with deeply nested structures, millions of them. For some context - some of the serialized trees that I have are 300MB in msgpack! (that's >1GB in memory after deserialization). And they have ONE root.
So when something breaks during deserialization, the current basic functionality is unusable. Even if I could have inspected the 300MB worth of text, finding what caused an error is impossible. The outer context is not helpful, on the contrary it is detrimental. A typical example - I've changed some model and trying to code a migration from the old one.
They way I've fixed it is by using an unwrapped helper: code and doc
It is almost ideal, but there is one thing that can't be done outside of mashumaro - namely, I can't get the index of child if it is part of a sequence. Only the field name. So for model like this (omitting data class and base class for brevity):
if I say pass a string that is not an integer in a leaf Child node 10 levels deep, my helper will tell me the path
children.children.[8 times more].attr
with mashumaro's inner exception. But I won't know the indexes along the way.Describe the solution you'd like At least for the
InvalidFieldValue
exception add path capturing, including the index of item in a sequence.And then either:
Describe alternatives you've considered See above
Additional context I can try my hand at this if the concept seems right. Didn't want to do a PR before we discuss this