Closed sebastian closed 5 years ago
Reference email (for Sebastian) in Missive. TL;DR: Feedback for Robert at TLF
they don't understand what the different column headers actually mean. Solution: provide description somewhere
Straightforward action item, that I think should be a separate ticket.
the number of total columns grew over the weekend without any changes to the data source. Could this be the cloak redetermining the schema periodically in the background?
What I could find was (https://github.com/Aircloak/aircloak/blob/master/cloak/lib/cloak/data_source.ex#L114):
Validates that the Cloak can connect to the data source, and updates the online status of the
data source. If the data source has been offline, it also has it's table definitions refreshed.
so it indeed seems possible that a data source gets rescanned and if columns were added to the underlying database, they will get picked up. I'm not sure why they are surprised, though? Do they not know that columns were added?
the data source vdc1w_dw has been online for a month, yet has incredibly low numbers of analyses operations completed. Why?
Unclear to me how to approach solving such a problem at this time... Perhaps we can talk on slack.
from the stats (diff Friday to Monday) it looks like it is going to take ~500 days before the analysis is complete. I think the system would benefit from making a) showing a rough estimate on how long it will take based on average per column time, and b) show when the next cycle is likely to start again
Given the refresh period is smaller than 500 days, the analysis will never stop - it will just cycle through the columns. Because of that, while I do agree that showing some indication of how long it's going to take and when will it restart is nice, I don't think it helps in this particular case.
Unclear to me how to approach solving such a problem at this time... Perhaps we can talk on slack.
This one has been solved (with update that was made today).
The cause was that all LIMIT
in the shadow db queries caused the query to become emulated. This in turn led to entire tables being loaded out of the database. It was slow and incredibly memory hungry.
Given the refresh period is smaller than 500 days, the analysis will never stop - it will just cycle through the columns. Because of that, while I do agree that showing some indication of how long it's going to take and when will it restart is nice, I don't think it helps in this particular case.
Well in this case it would have to show that it would be never ending. So I guess the logic would be:
Well in this case it would have to show that it would be never ending. So I guess the logic would be:
- if analysis time is shorter than repeat time, then show when it will start again
- if analysis time is longer than repeat time, then show that it will likely go on indefinitely
So I guess that's what's left in this issue, is that right?
Correct 👍
I consider all the relevant and useful features to be implemented in this issue.
Here is further feedback on the analyses page from Telefonica:
they don't understand what the different column headers actually mean. Solution: provide description somewhere(turned into separate issue: https://github.com/Aircloak/aircloak/issues/3494)the number of total columns grew over the weekend without any changes to the data source. Could this be theLikely reason is change of underlying data sourcecloak
redetermining the schema periodically in the background?the data sourceReason seems to be slow queries due to emulatedvdc1w_dw
has been online for a month, yet has incredibly low numbers of analyses operations completed. Why?LIMIT
. A fix has been shippedFrom Friday:
From Monday: