Closed NicoRicardi closed 7 months ago
Attention: 2 lines
in your changes are missing coverage. Please review.
Comparison is base (
02da102
) 74.55% compared to head (4205cef
) 74.63%.
Files | Patch % | Lines |
---|---|---|
kgforge/core/conversions/dataframe.py | 91.66% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I'm not entirely sure why it is necessary to have nested iterations over items? If I understand correctly, you get the pre, and then when you have one pre, you iterate over everything trying to find everything that is a path starting in pre, but I would think there's a way to do it without such time complexity
https://stackoverflow.com/questions/54776916/inverse-of-pandas-json-normalize there could be some inspiration here?
I've also not looked at it in enough detail but deflatten method seems to be used with rather "controlled" input -> extracted from a dataframe. So the cases for which there are ValueError could not be given as input. It really depends if we see someone use the function as it is and providing their own items, in which case, yes, you should check possibly faulty input. But here, it looks like it's an internal usage function with controlled input where these errors wouldn't be thrown.
I've also not looked at it in enough detail but deflatten method seems to be used with rather "controlled" input -> extracted from a dataframe. So the cases for which there are ValueError could not be given as input. It really depends if we see someone use the function as it is and providing their own items, in which case, yes, you should check possibly faulty input. But here, it looks like it's an internal usage function with controlled input where these errors wouldn't be thrown.
There is no check - to the best of my understanding - that one does not have a column 'agent' and a column 'agent.something in their DataFrame'. As such the error makes sense
I'm not entirely sure why it is necessary to have nested iterations over items? If I understand correctly, you get the pre, and then when you have one pre, you iterate over everything trying to find everything that is a path starting in pre, but I would think there's a way to do it without such time complexity
At least at a first glance, the complexity is the same as the previous implementation. I feel like the problem seems easier at first, but I really do not see an easy way out of such complexity. I will look at the link you shared
You have both nested loops and recursivity, the complexity is higher in this implementation
You have both nested loops and recursivity, the complexity is higher in this implementation
I missed a bit of code, indeed that was already the case
As of now, deflatten assumed that all items are ordered according to the nesting. Different input would lead to several values being discarded. For instance
[('p1.id', 'id1'), ('p1.type', 'type1'), ('p2.id', 'id2'), ('p2.type', 'type2'), ('p1.x', 'x1'), ('p2.x', 'x2')]
would cause such issue. Furthermore, if a key appears both as nested and not nested (e.g. 'p1', and 'p1.x') the non-nested value would be discarded without providing any warning.The current version ensures that:
Two tests have been added to check this. Closes #378