SEED-platform / seed

Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
107 stars 55 forks source link

MAPPING: Fix Slow Type ahead list #10

Closed RDmitchell closed 9 years ago

RDmitchell commented 9 years ago

I have a Portfolio Manager data file with 192 fields in it, and during the mapping phase, the program is very slow to generate the list when I am typing ahead. To the point of waiting a few seconds for each letter to appear after I have typed it. Need to figure out a way that the program can handle this number of fields when generating the type ahead list.

MarshallDDB commented 9 years ago

This is partially a server latency issue, and is better in some instances of SEED than others.

RDmitchell commented 9 years ago

I have a Portfolio Manager data file with 250 fields in it, and during the mapping phase, the program is very slow (and gets slower and slower) in generating the list when I am typing ahead. To the point of waiting a few seconds for each letter to appear after I have typed it. Need to figure out a way that the program can handle this number of fields when generating the type ahead list.

Miles and I witnessed this behavior first hand in a call with Marshall in DC, and also on my home computer.

Rich thinks that it is an issue with JavaScript on the local computer, as it appears to peg out the CPU on the local computer -- we saw this on Marshall's computer in the webinar, and on my home computer, but I have actually not seen that same behavior on my work computer, even when the type ahead list is operating slowly -- my CPU is not maxed out.

There is an example file in the gdrive Issues folder for this issue.

RDmitchell commented 9 years ago

Needs a PM file with a lot of columns to that he can use for debugging. Robin to provide.

RDmitchell commented 9 years ago

Here is a graphic showing where in the program the problem is. image

RDmitchell commented 9 years ago

Miles put a fix from Greg for this and it seems to have broken mapping. This is what I now get on seedtest.lbl.gov when I try to map one of Marshall's files with lots of fields. There are errors, plus no fields show up for mapping.

image

RDmitchell commented 9 years ago

I also tried with the sample covered building list csv and got the same error.

skyhawk86305 commented 9 years ago

Error is code. Fix pushed to branch FixSlowTypeAheadList today. I tested this using the SEED platform built in the Vagrant environment using the code in /seed/samples/SEED-Platform. See the README.md file for instructions as to how to build. I suggest that this might be the hand-off environment we give new developers in January to build and test SEED from.

Full-stack Web Developer and Device Driver expert

On Wed, Dec 24, 2014 at 4:31 PM, RDmitchell notifications@github.com wrote:

I also tried with the sample covered building list csv and got the same error.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68078547.

RDmitchell commented 9 years ago

Miles, if you can merge this in to tge seedtest repo I will test it. On Dec 28, 2014 6:06 PM, "Greg White" notifications@github.com wrote:

Error is code. Fix pushed to branch FixSlowTypeAheadList today. I tested this using the SEED platform built in the Vagrant environment using the code in /seed/samples/SEED-Platform. See the README.md file for instructions as to how to build. I suggest that this might be the hand-off environment we give new developers in January to build and test SEED from.

Full-stack Web Developer and Device Driver expert

On Wed, Dec 24, 2014 at 4:31 PM, RDmitchell notifications@github.com wrote:

I also tried with the sample covered building list csv and got the same error.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68078547.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68227363.

mmclark commented 9 years ago

I'll work on that later today.

thanks,

Miles

On Mon, Dec 29, 2014 at 9:33 AM, RDmitchell notifications@github.com wrote:

Miles, if you can merge this in to tge seedtest repo I will test it. On Dec 28, 2014 6:06 PM, "Greg White" notifications@github.com wrote:

Error is code. Fix pushed to branch FixSlowTypeAheadList today. I tested this using the SEED platform built in the Vagrant environment using the code in /seed/samples/SEED-Platform. See the README.md file for instructions as to how to build. I suggest that this might be the hand-off environment we give new developers in January to build and test SEED from.

Full-stack Web Developer and Device Driver expert

On Wed, Dec 24, 2014 at 4:31 PM, RDmitchell notifications@github.com wrote:

I also tried with the sample covered building list csv and got the same error.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68078547.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68227363.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68278359.

mmclark commented 9 years ago

This branch is now running on seedtest.lbl.gov

On 12/29/2014 09:33 AM, RDmitchell wrote:

Miles, if you can merge this in to tge seedtest repo I will test it. On Dec 28, 2014 6:06 PM, "Greg White" notifications@github.com wrote:

Error is code. Fix pushed to branch FixSlowTypeAheadList today. I tested this using the SEED platform built in the Vagrant environment using the code in /seed/samples/SEED-Platform. See the README.md file for instructions as to how to build. I suggest that this might be the hand-off environment we give new developers in January to build and test SEED from.

Full-stack Web Developer and Device Driver expert

On Wed, Dec 24, 2014 at 4:31 PM, RDmitchell notifications@github.com wrote:

I also tried with the sample covered building list csv and got the same error.

— Reply to this email directly or view it on GitHub

https://github.com/SEED-platform/seed/issues/10#issuecomment-68078547.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68227363.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-68278359.

RDmitchell commented 9 years ago

I am adding the info from Greg about his proposed fix (email from him dated 1/5/2015). Greg and Miles are discussing their final solution and will let us know either in this issue or in the Dev meeting next Tuesday (1/13/2015), so that we can decide on the correct approach and implement it.

* problem.txt from Greg 1/5/2015 *

Slow TypeAhead Issue #10.

During the mapping phase, the program is very slow to generate a list of SEED fields when presented with a typeahead list of "suggested" field names. To the point of waiting a few seconds for each letter to appear after typing, causing the CPU to spike at 100%.

Problem: The problem is being caused by an inordinate number of $watch statements being created for unbound nested scope variable find_duplicates. scope variables can be function objects as well as primitives. There are very likely more instances to this behavior in other parts of the project. One potential solution is to rid the application of unneceasssary scope variables, so as to not create unnecesary $watch statements that inject themselves into the Angular digest cycle and cause delay. When the list is presented to the user, the current code checks for duplicates when presenting the list to the user via the typeahead-on-select directive using the scope object change. The scope variable is nested inside two Angular directives(i.e typeahead-on-select nested inside ng-repeat). For each item that is iterated through the ng-repeat loop(i.e the 200+ or so entries) Angular creates a new "child" scope variable, which inherits from the parent scope but also assigns the property(is_duplicate) to each new child scope instance.

Solution: Eliminate the large number of $watches creates for scope variables, thus reducing the processing time of the Angular $digest cycle.

For example...

change this:


$scope.find_duplicates = function (array, element) {
    var indicies = [];
    var idx = array.indexOf(element);
    while (idx !== -1) {
        indicies.push(idx);
        idx = array.indexOf(element, idx + 1);
    }
    return indicies;
};

/*
 * Returns true if a TCM row is duplicated elsewhere.
 */
$scope.is_tcm_duplicate = function(tcm) {
    var suggestions = [];
    for (var i = 0; i < $scope.raw_columns.length; i++){
        var potential = $scope.raw_columns[i].suggestion;
        if (typeof potential === 'undefined' ||
                potential === '' ||
                ! $scope.raw_columns[i].mapped_row
        ) {
            continue;
        }
        suggestions.push($scope.raw_columns[i].suggestion);
    }
    var dups = $scope.find_duplicates(suggestions, tcm.suggestion);
    if (dups.length > 1) {
        return true;
    }
    return false;
};

to this:

    $scope.is_tcm_duplicate  = function(tcm) {
        var suggestions = [];
        var dups = 0;
        for (var i = 0; i < $scope.raw_columns.length; i++){
            var potential = $scope.raw_columns[i].suggestion;
            if (typeof potential === 'undefined' ||
                    potential === '' ||
                    ! $scope.raw_columns[i].mapped_row
            ) {
                continue;
            }
         suggestions.push($scope.raw_columns[i].suggestion);
        }  // end for var i = 0)

        return check_duplicates(suggestions, tcm.suggestion) 

    };

/*  01-14-2015   - addded check_for_duplicates */   

    function check_duplicates(array, element) {
        var indicies = [];
        var idx = array.indexOf(element);
        while (idx !== -1) {
            indicies.push(idx);
            idx = array.indexOf(element, idx + 1);
        }
        if (indicies.length > 1)
           return true;

        return false;
    }

The number of $watch statements is greatly reduced by creating a local javascript function called check_duplicates that returns true or false based on a duplicate being found or not. Additionally, is_tcm_duplicate can be changed to a local javascript function and further reduce the number of $watchs. There may be additional changes needed in mapping_controllers.js to further reduce the number of $watch statments.

Short Term Solution: Reduce the number of unbound $scope variables in mapping_controller.js

1 day.

Long Tem Solution: Create a custom typeahead directive.

A better, longer term solution would be to create a custom directive(i.e SEED-Typeahead). A directive is a marker on a DOM element that attaches a specific behavior to that DOM element. Directives do not process scope using the normal parent/child relationship and treat scope as seperate or "isolated", except for specfic model instances that are attached to the scope. That allows the directive to be reusable in other parts of the project.

2-3 days.

skyhawk86305 commented 9 years ago

Duplicate Checking: To prevent checking duplicates with each keypress, the change function in mapping_controllers.js would be altered to not call is_tcm_duplicate. Additionally, in order to handle checking duplicates, as the user moves between fields, a directive would be written to handle the mouseenter, click, keydown and blur events. A blur event is triggered when the user leaves the field. The key portion of the code is to create a watch action that keeps track of the current item highlighted and compares it to duplicate items in the list. That directive would also handle autocomplete events based on the number of keystrokes need to trigger the existing promise that populates the list. I also suggest adding a single HTML div tag for debugging purposes, just after the Table Body tag in mapping.html(around line 93). For unit testing, I suggest one Jasmine test case written to exercise both the directive and DOM element using with Selenium or Phantom.js.

rebrownlbnl commented 9 years ago

This seems like a good solution. I'd like to hear Miles' opinion.

mmclark commented 9 years ago

I agree. Let's move forward.

On 01/11/2015 11:18 PM, Rich Brown wrote:

This seems like a good solution. I'd like to hear Miles' opinion.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-69535788.

RDmitchell commented 9 years ago

I also concur.

skyhawk86305 commented 9 years ago

Robin, I'm running a baseline test using the existing code on SEED(i.e master branch) before testing out my changes.

In regards to testing: After loading the spreadsheet file and clicking the "Continue to Data Mapping" button, the "Data Mapping and Validation" page is presented. The "Header" column has the suggested text loaded. For fields labeled "invalid" and in red, should not the cursor be positioned at the far right of the text field. Question: Where should the user start typing in the text field?. For example: Using the test data set with 200+ fields, the first invalid field displayed after loading is the second field, which is labeled "Multifamily_-_Number_of_Bedrooms" to the right of the "invalid" indicator. The SEED Header suggestion is "Multifamily Housing - Number of Bedrooms". The cursor is positioned to the right of the 'g' in 'Housing". If I type anything into the text field after that it changes the invalid indicator to Valid(Green) including a space. For the next field, 'immingPool-_Months_in_U", if I move to the text field for the SEED Header and remove the "S" from "Swimming Pool - Months in Use" it also changes the field indicator to "valid". Am I missing something here because it seems both these behaviors seem strange. However, if I highlight the text field and delete the text and start typing I see that one of the suggested fields shows up as "Swimming Pool - Months in Use" . When I select that field from the dropdown list and hit enter it does not change the field from invalid to valid, which I believe it should. Perhaps you and I or someone from the team can run through this over phone. I need to be sure that what I'm correcting is, in fact, the correct behavior. Greg .

Full-stack Web Developer and Device Driver expert

On Mon, Jan 12, 2015 at 11:50 AM, RDmitchell notifications@github.com wrote:

I also concur.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-69622658.

RDmitchell commented 9 years ago

Greg

Is this behavior something I can see on seedtest.lbl.gov?

If not, maybe we can have a call and you can show me the behavior.

Robin


Robin Mitchell

Building Technology and Energy Systems

Lawrence Berkeley National Laboratory

From: Greg White [mailto:notifications@github.com] Sent: Tuesday, January 13, 2015 10:11 AM To: SEED-platform/seed Cc: RDmitchell Subject: Re: [seed] MAPPING: Fix Slow Type ahead list (#10)

Robin, I'm running a baseline test using the existing code on SEED(i.e master branch) before testing out my changes.

In regards to testing: After loading the spreadsheet file and clicking the "Continue to Data Mapping" button, the "Data Mapping and Validation" page is presented. The "Header" column has the suggested text loaded. For fields labeled "invalid" and in red, should not the cursor be positioned at the far right of the text field. Question: Where should the user start typing in the text field?. For example: Using the test data set with 200+ fields, the first invalid field displayed after loading is the second field, which is labeled "Multifamily_-_Number_of_Bedrooms" to the right of the "invalid" indicator. The SEED Header suggestion is "Multifamily Housing - Number of Bedrooms". The cursor is positioned to the right of the 'g' in 'Housing". If I type anything into the text field after that it changes the invalid indicator to Valid(Green) including a space. For the next field, 'immingPool-_Months_in_U", if I move to the text field for the SEED Header and remove the "S" from "Swimming Pool - Months in Use" it also changes the field indicator to "valid". Am I missing something here because it seems both these behaviors seem strange. However, if I highlight the text field and delete the text and start typing I see that one of the suggested fields shows up as "Swimming Pool - Months in Use" . When I select that field from the dropdown list and hit enter it does not change the field from invalid to valid, which I believe it should. Perhaps you and I or someone from the team can run through this over phone. I need to be sure that what I'm correcting is, in fact, the correct behavior. Greg .

Full-stack Web Developer and Device Driver expert \

On Mon, Jan 12, 2015 at 11:50 AM, RDmitchell notifications@github.com wrote:

I also concur.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-69622658.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-69790538 .Image removed by sender.

RDmitchell commented 9 years ago

Testing:

For the same browser (Chrome) screen width, the BEDES column is wider on seedtest than seed. Don't know if this is from the work for this issue or not. Is nice to have that column be as narrow as possible

image

RDmitchell commented 9 years ago

Testing

Seems like seedtest (with the new type ahead list fix) is not checking for duplicates, while seed (production server with original code) IS checking for duplicates.

This is before any "manual" mapping (which uses the type ahead list) by the user takes place. This is just what appears when the program first displays the mapping screen.

I know that part of the change for the type ahead list was to not check for duplicates with every key stroke, but this new fix seems like it never checks for duplicates, even at the very beginning of the process.

I think we want to keep some level of duplicate checking. Is this not going to be possible with this current fix?

The screen shot below shows one section of the mapping screen (for the file with LOTS of Portfolio Manager fields) but the complete comparison of all the fields between seed and seedtest can be found in the drive folder for this issue, a gdoc called "Issue 10 - Testing"

image

RDmitchell commented 9 years ago

The type ahead list seems to be behaving pretty well -- it still can use a bit of processing resources, but it doesn't seem to slow down as you move through the list. (I did this test on the PM file that has 200+ fields). So that is good.

However, the program is not checking for duplicates anywhere in the process, and it needs to. We might want to discuss when to check for duplicates (if it doesn't do it in real time, it should do it when the user clicks "Map Your Data").

My test for duplicate data

So this is not the correct behavior

I seem to have come down with the flu, so probably won't be doing much more testing until tomorrow,but it seems like there are lots of things to work on before I need to test again.

skyhawk86305 commented 9 years ago

After some additional testing with Chrome I found another bug. Chrome is not handling the keyup event correctly. Will attempt to fix tomorrow.

RDmitchell commented 9 years ago

Testing latest fix: characters are still slow to appear. I can type a whole word, wait, then the word appears. Also, program still seems to be using a lot of CPU resources when displaying the type ahead

Because we don't have a good metric to asses this, it is hard to know if this is an improvement. The improvement might be that it doesn't get slower and slower as you work through the list, it is just uniformly slow (and obviously then less slow than when it used to get completely bogged down by the middle or end of the list).

I guess I would say that it is slightly better but not that much of an improvement. I will do more testing later today to try to answer this question.

image

RDmitchell commented 9 years ago

Megha -- I am giving you this bug. You may not be able to see the problem, as it seems to be computer dependent. We can go over it together whenever you decide to take it on, and I can try to show you the problem.

RDmitchell commented 9 years ago

Megha -- I have made a video that shows this problem. The SWF video file and the XLSX data file used in the video are both in the gdrive folder for this issue.

meghasandesh commented 9 years ago

I am seeing a bit of slow down in the type ahead on my iMac, so I would imagine it only gets worse on other machines. I will look into this.

On Tue, Feb 10, 2015 at 11:58 AM, RDmitchell notifications@github.com wrote:

Megha -- I have made a video that shows this problem. The SWF video file and the XLSX data file used in the video are both in the gdrive folder for this issue.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-73772012.

RDmitchell commented 9 years ago

here is the link for the folder for this issue https://drive.google.com/a/lbl.gov/folderview?id=0B3fTKpZ9Dx7LTnk4Z0RiRXVHMG8&usp=sharing

meghasandesh commented 9 years ago

Thanks, I have saved the 'Issues' folder into my drive, so I will be seeing any updates you make there.

Megha

On Tue, Feb 10, 2015 at 12:22 PM, RDmitchell notifications@github.com wrote:

here is the link for the folder for this issue

https://drive.google.com/a/lbl.gov/folderview?id=0B3fTKpZ9Dx7LTnk4Z0RiRXVHMG8&usp=sharing

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-73776295.

meghasandesh commented 9 years ago

Robin,

I have uploaded a fix to this issue. You can test it when Miles pulls the change to seedtest.

Miles.

Could you pull this branch in to seedtest?

Thanks

Megha

mmclark commented 9 years ago

The typeahead branch is now running on seedtest

On 02/20/2015 07:40 PM, meghasandesh wrote:

Robin,

I have uploaded a fix to this issue. You can test it when Miles pulls the change to seedtest.

Miles.

Could you pull this branch in to seedtest?

Thanks

Megha

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-75353858.

meghasandesh commented 9 years ago

Thank you!

On Mon, Feb 23, 2015 at 7:58 AM, Miles Clark notifications@github.com wrote:

The typeahead branch is now running on seedtest

On 02/20/2015 07:40 PM, meghasandesh wrote:

Robin,

I have uploaded a fix to this issue. You can test it when Miles pulls the change to seedtest.

Miles.

Could you pull this branch in to seedtest?

Thanks

Megha

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-75353858.

— Reply to this email directly or view it on GitHub https://github.com/SEED-platform/seed/issues/10#issuecomment-75570428.

RDmitchell commented 9 years ago

I will test.

cycustodio commented 9 years ago

API test this morning had no issues.

RDmitchell commented 9 years ago

I have tested this and it is a great improvement. It is ready to be merged into production. I am going to close this issue.