Closed qais-yousef closed 5 years ago
Hmm tests/test_idle.py fails now because the test has duplicated entries in the series being tested.
@valschneider I think to remember that at a certain point we considered making use of the line number in conjunction with the timestamp to ensure a unique index.
Still wondering it that should not be the way instead of playing with fixing up timestamps handle_duplicate_index
I think to remember that at a certain point we considered making use of the line number in conjunction with the timestamp to ensure a unique index. Still wondering it that should not be the way instead of playing with fixing up timestamps handle_duplicate_index
A quick search on pandas returns multi-index support but still the impact on performance and how much of the code will need to change to deal with that warrants a proper investigation..
Removing the duplicates seems an easier way forward? Can you think of disadvantages or things that could break?
@valschneider I think to remember that at a certain point we considered making use of the line number in conjunction with the timestamp to ensure a unique index. Still wondering it that should not be the way instead of playing with fixing up timestamps
handle_duplicate_index
It would ensure uniqueness, yes, but using {line number, timestamp} as a dual index is not an option for us - that'd break the API (and sorting/slicing performance would take a significant hit). Or did you mean something else?
We could use the line number to ensure correct ordering of events with duplicate timestamps, at least in regards to the trace, but I think that's already the case by construction.
Yes, I guess using the line number at least to ensure correct ordering of fixed timestamp is the bare minimum we can do without breaking APIs... that would require to have a look at how handle_duplicate_index
handle duplications and add in support for the line number as an additional check.
BTW: how kernelshark handle and visualize those duplicated events?
BTW: how kernelshark handle and visualize those duplicated events?
Nothing special about them. You just see them all appearing at the same point of time.
I added a new comment to handle __line column if it exists. Running some tests against it now to make sure that fastmodel doesn't break in new ways now :)
duplicated timestamps happen frequently and cause some failures down the line
Can you elaborate on those "failures down the line". We never used it in dataframe generation because we didn't see the need for it.
Can you elaborate on those "failures down the line". We never used it in dataframe generation because we didn't see the need for it.
My original attempt to fix the problem with a verbose commit messages with the exact points of failures is here: https://github.com/ARM-software/lisa/pull/864
I could expand the commit message here too I guess but felt it'd be rather noisy. Would be happy to include the backtrace from the 2 commit messages in the lisa PR referenced above if you want.
Would be happy to include the backtrace from the 2 commit messages in the lisa PR referenced above if you want.
That or just a reference to Arm-software/lisa#864 . Either is fine.
While you are at it, can you remove the call to handle_duplicate_index
in stats/grammar.py
? It shouldn't be needed now.
While you are at it, can you remove the call to handle_duplicate_index in stats/grammar.py? It shouldn't be needed now.
Sure done, I updated the PR to include that and I reference https://github.com/ARM-software/lisa/pull/864 in the commit message that introduces handle_duplicate_index() in base.py
My run with the fastmodel seem to be happy with this. I'll have another run to verify the new change to grammer.py isn't causing any unexpected behavior (although I'm not sure whether the code is being exercised or not).
I'll have another run to verify the new change to grammer.py isn't causing any unexpected behavior (although I'm not sure whether the code is being exercised or not).
Ok, let me know when you are happy and I'll merge it.
Thanks! Lisa ran without a problem. So hopefully this means it doesn't break anything obviously at least and good to go :)
When running Lisa tests on fastmodels duplicated timestamps happen frequently and cause some failures down the line.
The duplicated timestamps are not bad themselves and a side effect of simulated time not having good enough resolution.
Handling this duplication at the dataframe creation seems to be the most generic and future proof solution.
Signed-off-by: Qais Yousef qais.yousef@arm.com