Closed dlebauer closed 7 years ago
make sure to use explain on some of these queries and try to figure out what happens. The traits_and_yields_view is very expensive since it always does 2 selects that are combined.
We may want to consider using materialized views (sort of like caching a view so it doesn't have to be regenerated every time).
@gsrohde I set up daily jobs on bety6 to run
refresh materialized view traits_and_yields_view;
refresh materialized view traits_and_yields_view_private;
Is there a better way to automate this?
@dlebauer Did you use a cron job? If so, which crontab did you use?
I think this is OK as far as it goes, but by themselves, these two commands aren't sufficient: You also need to refresh traitsview_private
and (if used) yieldsview_private
. Moreover, the refreshes should be done in order of dependency unless you want to wait an extra day to see the result of changes to the traits or yields tables. In other words, use the following order:
traitsview_private
and yieldsview_private
(in either order).traits_and_yields_view_private
.traits_and_yields_view
.Note that refreshing traitsview_private
takes a while.
I did it with Navicat ... of you can do it in a way that doesn't depend on my desktop computer and works for all instances of BETYdb that would be preferable. Then I can delete the one on my desktop On Thu, Feb 23, 2017 at 2:25 PM Scott Rohde notifications@github.com wrote:
@dlebauer https://github.com/dlebauer Did you use a cron job? If so, which crontab did you use?
I think this is OK as far as it goes, but by themselves, these two commands aren't sufficient: You also need to refresh traitsview_private and (if used) yieldsview_private. Moreover, the refreshes should be done in order of dependency unless you want to wait an extra day to see the result of changes to the traits or yields tables. In other words, use the following order:
- Refresh traitsview_private and yieldsview_private (in either order).
- Refresh traits_and_yields_view_private.
- Refresh traits_and_yields_view.
Note that refreshing traitsview_private takes a while.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/PecanProject/bety/issues/419#issuecomment-282110308, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5wIgtwEqA5j_CXyKwqVotPVDBJqOks5rfesngaJpZM4IwRdK .
@robkooper I notice the bety6 root crontab is managed by puppet. Can we have puppet handle refreshing the views as well? Or should I just put something in my own crontab? @dlebauer I assume we also want to refresh materialized views on bety7, yes?
bety7 is lower priority than betydb.org ... can this be added as a general solution e.g. via migration + rake task? On Thu, Feb 23, 2017 at 2:37 PM Scott Rohde notifications@github.com wrote:
@robkooper https://github.com/robkooper I notice the bety6 root crontab is managed by puppet. Can we have puppet handle refreshing the views as well? Or should I just put something in my own crontab? @dlebauer https://github.com/dlebauer I assume we also want to refresh materialized views on bety7, yes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PecanProject/bety/issues/419#issuecomment-282113489, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX51gsfjvNiYAowtp5UoREuK1vAY0uks5rfe4EgaJpZM4IwRdK .
Or at least added to documentation in a way that is straightforward to implement? I suspect other users will want this feature as well. On Thu, Feb 23, 2017 at 3:05 PM David LeBauer dlebauer@gmail.com wrote:
bety7 is lower priority than betydb.org ... can this be added as a general solution e.g. via migration + rake task? On Thu, Feb 23, 2017 at 2:37 PM Scott Rohde notifications@github.com wrote:
@robkooper https://github.com/robkooper I notice the bety6 root crontab is managed by puppet. Can we have puppet handle refreshing the views as well? Or should I just put something in my own crontab? @dlebauer https://github.com/dlebauer I assume we also want to refresh materialized views on bety7, yes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PecanProject/bety/issues/419#issuecomment-282113489, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX51gsfjvNiYAowtp5UoREuK1vAY0uks5rfe4EgaJpZM4IwRdK .
If you need to update the view, use the cronjob. Only those entries that have a comment in front #puppet will be managed by puppet. You can add any crontab entries at the top you want.
Just updating bety7 (terraref.ncsa.illinois.edu/bety-test) and refreshing materialized views. This database should be fairly static, but recording the queries in the prescribed order
refresh materialized view traitsview_private;
refresh materialized view yieldsview_private;
refresh materialized view traits_and_yields_view_private;
refresh materialized view traits_and_yields_view;
@dlebauer These commands are now in a script file (Rails root)/script/cron/refresh_views.psql
in the repository. I had put such a file on bety6 earlier but neglected or chose not to check it in (and I had the last two lines reversed, meaning an extra day to see view updates). It's checked in now (though the migration to switch to materialized views isn't yet).
I added a cron job on bety7 to run this nightly. It's in machine-user postgres's crontab if you want to adjust the frequency or even disable it. It wouldn't be too hard to elaborate the script somewhat to only refresh the views if something has actually changed since the last time they were refreshed, or only if X number of rows were added or updated since the last time they were refreshed.
Next I am also going to try to add an index
create index on traits_and_yields_view (trait);
create index on traits_and_yields_view (trait, sitename);
At this point, SQL is performant while the API is an order of magnitude slower.
perhaps replace the Ruby json creator under the API with an SQL that exports as json
Closing: speeding up API will be implemented by https://github.com/PecanProject/bety/issues/516
[Added 3-7-2017:] Here are the tasks associated with this issue:
=========================================================== On the terraref.ncsa.illinois.edu/terra-test instance I've added ~16 million rows.
A few issues this has caused:
Here are some sample queries that take too long. Now that :