ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 24 forks source link

Admin Page is Really Slow #380

Open Soben713 opened 7 years ago

Soben713 commented 7 years ago

I created this issue to address the performance problem in the admin page. As we all know, the admin page takes a while to load. And even after it is loaded, it is very slow due to the massive data that needs to be loaded and rendered in the page. This needs to be fixed.

Related issue: #343

Soben713 commented 7 years ago

One of the main causes of this problem is the huge DC map which includes all the labels. There seems to be a way to make the map a little bit faster by using canvas instead of SVG layers. It seems to be in the leaflet's documentation too (note that the example in the documentation is exactly the problem we have — "many thousands of circle markers on the map").

image

I checked this solution and it makes it significantly faster (but it is not perfect yet) on my localhost. However, the image quality is reduced a little bit (my guess is that it only happens in high-density displays though). Here is a screenshot for comparison:

L_PREFER_CANVAS = true --> lower quality, higher performance: image

L_PREFER_CANVAS = false --> higher quality, lower performance: image

jonfroehlich commented 7 years ago

Interesting. I'd like to assign this to @tongning along with https://github.com/ProjectSidewalk/SidewalkWebpage/issues/343 but we should discuss this. I'll assign it tentatively to @tongning for now. I think focusing on https://github.com/ProjectSidewalk/SidewalkWebpage/issues/343 first should be a priority.

tongning commented 7 years ago

Two of the tables (labels and users) are quite huge as well. On my computer removing them reduced the page load time to 3-5 seconds from 10 seconds (as measured by Page load time chrome extension). Moving these tables to different pages should help significantly.

jonfroehlich commented 7 years ago

@tongning, yep, that's all part of redesigning the admin page with a menu/tab interface: https://github.com/ProjectSidewalk/SidewalkWebpage/issues/343. Thanks for investigating this. We can use a similar approach to ensure that our pages have fast load times.

manaswisaha commented 7 years ago

Adding Anthony's recent investigations (in #588):

Even with the smaller test dataset, the server takes around 4-5 seconds to respond with the admin page after the initial request, according to the waterfall in Chrome dev tools. This may potentially be related to the OutOfMemory issues at #508.

I removed various elements of the admin page one at a time, to see which parts are contributing most to the page generation time. Documenting the results below for reference, time in seconds.

Original page: (5.37, 4.45, 4.05, 4.55, 4.21)

Without the Recent Labels table: (3.90, 3.61, 3.61, 3.76, 3.49)

Without the Activities table: (4.19, 3.78, 3.57, 3.79, 3.66)

Without the Registered Users table: (2.29, 2.28, 2.20, 2.13, 2.15)

The registered users table seems to be contributing a lot, especially with only ~150 users in the test dataset. Improving the table pagination so that the server doesn't have to return all of the data at once might help with this.

misaugstad commented 6 years ago

The registered users table seems to be contributing a lot, especially with only ~150 users in the test dataset. Improving the table pagination so that the server doesn't have to return all of the data at once might help with this.

Yeah well over 50% off the time of loading the initial admin page is from the user table. I think this is because we are querying for the list of users, then for each user we are running 5 separate queries to get metadata about them. I think if we did this in just a single query (or even in 5 queries) instead of having a number of queries equal to 5 times the number of users, this would be a lot faster. I'm talking about bringing the admin page loading time from 2 minutes to 30 seconds or so, so it still isn't going to be fast at all :grin:

jonfroehlich commented 6 years ago

Reducing the number of queries is important... another strategy is pre-computing data and offloading to another table. Is that easy to do?

misaugstad commented 6 years ago

It is definitely more complicated than just optimizing the queries themselves, especially if we want to be able to see the most up-to-date information on the admin page (which we often do), b/c that would require real-time updates to those tables.

I think that pre-computed tables are always an option, but are closer to a last resort (especially if it is just for the admin page). We probably all agree that the way to do this is to iteratively look for what part of the page is taking the longest to load and fixing that. So if this part still takes longer to load than anything else, then we should look into pre-computing here.

misaugstad commented 6 years ago

Okay so I found a few cases where adding an index to a table improves performance for certain queries. It isn't a silver bullet, but it can definitely help in some cases. The best part is that it really doesn't have a downside for us. The typical downside is that it makes insertion and deletion queries slower, but that isn't a problem for us at all right now.

It is also nice because adding indices doesn't require adding new code. We can just log into the server and make those updates to the database.

misaugstad commented 6 years ago

Furthermore, I think that we should make these updates directly to the database ourselves instead of adding them to a Play evolutions file in the code because some of the indices we want to add will take hours to create, and that's not something I think we should have running automatically whenever someone wants to test out old versions of the code in the local dev environment.

misaugstad commented 6 years ago

And the most time-consuming query we have is the one that computes the amount of time audited by each user. I think that we should pre-compute these values and put them in a table, updating nightly. This query isn't being used on the main admin page, it is only run when you click on the "analytics" tab, but it takes something like 5 minutes to run.

manaswisaha commented 6 years ago

Definitely, agree. They shouldn't be a part of the play evolutions. However, they can be a part of the initial evolutions script when we set up a new database and the application e.g. for a new city. On an empty database, its quicker and who don't have deal with slow indexing.

On Fri, Oct 5, 2018 at 11:05 AM Mikey Saugstad notifications@github.com wrote:

Furthermore, I think that we should make these updates directly to the database ourselves instead of adding them to a Play evolutions file in the code because some of the indices we want to add will take hours to create, and that's not something I think we should have running automatically whenever someone wants to test out old versions of the code in the local dev environment.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/SidewalkWebpage/issues/380#issuecomment-427451543, or mute the thread https://github.com/notifications/unsubscribe-auth/ACvXgIWRw9XByZP3n1KJfox_tpntyQOCks5uh59xgaJpZM4KjCq1 .

-- Best Regards, Manaswi Saha Ph.D. Student Paul G. Allen School of Computer Science & Engineering University of Washington, Seattle homes.cs.washington.edu/~manaswi http://homes.cs.washington.edu/~manaswi Twitter - @manaswisaha https://twitter.com/manaswisaha

misaugstad commented 6 years ago

However, they can be a part of the initial evolutions script when we set up a new database and the application e.g. for a new city.

Agreed, I'm not exactly sure what the process is for creating a new database yet, but I think the fresh database we make should be set up correctly like this :)

misaugstad commented 3 weeks ago

When we get back to improving the load times of the Admin page (pretty important right now, since we can't even see the Admin page in Seattle #3544), it should be noted that the tabbed interface did a lot to help here, and I think that the low hanging fruit has been picked already. I did some profiling on my local environment: The page takes about 50-55 seconds to load, but no single query took more than 2 seconds total, even if it was used multiple times. The remaining methods of speeding up the initial page load that remain:

  1. There are a lot of places where we are running essentially the same query multiple times for different situations. Take the countValidationsByResult() functions. We are running a query to count the number of "Agree" votes, running another to count the number of "Disagree", and a third to count the number of "Unsure". It's even worse, because for each of those we are running a query for each label type as well, for a total of 27 separate queries that could really be run as one query that gives a count by validation result and label type, and then we can fill in the whole table using that output. The reason that we did it this way is that it's much easier to reason about it in the template scala (.scala.html) files; it's much easier to read/understand the HTML this way.
  2. Moving queries out of the template scala (.scala.html) and into new API endpoints that we then call from JS after the initial page load. When we run request this data in the template scala, we are telling the application to run the computation and put the result directly into the HTML before we send along the page's HTML. This can take over a minute, resulting in a proxy error! The most important thing is that the page loads at all. So even if the total load time doesn't decrease, getting the page to load and then sending further requests to the server for more data that needs to be computed would be worthwhile.

    Furthermore, when we upgrade to a newer version of the Play framework (#1258), we'll need to remove the directly embedded function calls from our template scala anyway! So there's really no reason not to do this. Plus it works hand in hand with point 1: Rather than sending 27 GET requests for data on validation counts, it makes a lot more sense to just send one request and then put data in the correct places using JS.

    When we do this, it would be helpful to add some sort of loading icon in places where we are waiting on data from the server, and then replace those with the data as we get it, or some sort of error icon if the GET request fails.

Here are a few functions I've seen that could benefit from some of this consolidation and removal from the template scala:

  1. countValidations() / countValidationsByResult(result: Int) / countValidationsByResult(result: Int, labelType: String) could be turned into a single query
  2. countValidationUsersContributed / countValidationResearchersContributed / countAllValidationUsersContributed
  3. countLabels() / countLabels(labelType: String)

The list goes on!

When working on this, you may want to try to prioritize which functions you do this to based off of the total time spent across all versions of that function. I just did a quick test of this for the countValidations functions by creating a variable var totalTime: Long = 0 in LabelValidationTable.scala. Then I turned countValidations(labelType: String) from this:

def countValidations(labelType: String): Int = db.withSession { implicit session =>
  val typeID = LabelTypeTable.labelTypeToId(labelType)

  validationLabels.innerJoin(labelsWithoutDeleted).on(_.labelId === _.labelId)
    .filter(_._2.labelTypeId === typeID)
    .length.run
}

to this

def countValidations(labelType: String): Int = db.withSession { implicit session =>
  val startTime: Long = System.nanoTime()
  val typeID = LabelTypeTable.labelTypeToId(labelType)

  val x = validationLabels.innerJoin(labelsWithoutDeleted).on(_.labelId === _.labelId)
    .filter(_._2.labelTypeId === typeID)
    .length.run
  val endTime: Long = System.nanoTime()
  val duration: Long = endTime - startTime
  totalTime += duration
  Logger.info(s"This time: ${duration / 1e9}, total time: ${totalTime / 1e9} seconds")
  x
}

I did this for all three countValidations functions to get the total time spent on them across all of those functions when loading the admin page. It's a little bit hacky, but it was quick and easy to do to get an idea of how much total time was spent on those funcs.