EyeSeeTea / SurveillanceCambodiaApp

Mobile application designed to report cases of malaria (to a DHIS2 server) for Cambodia (pictureapp blessed repository)
GNU General Public License v3.0
0 stars 0 forks source link

Slow DHIS2 Sync #97

Closed QISPSK closed 7 years ago

QISPSK commented 8 years ago

Is there anyway to speed up the sync with DHIS2? Perhaps on the second sync for OUs that don't have any MCS app data? The sync seems to take an inordinate amount of time regardless of internet connection speed. Not entirely sure what is syncing or if this process can be sped up at all.

QISPSK commented 8 years ago

This is increasingly becoming a bigger issue as it appears that the phone is downloading more data. The initial sync seems to require at least 11MB of data for the list of OUs and was timed at more than 40 minutes on a 3G connection at a recent training in Kratie (note that 22 other phones were attempting to sync with DHIS2 at the same time, not sure if this affects the sync). I'm not entirely sure what data is being downloaded, but it seems like it's more than a simple list of OUs in DHIS2 given the size. Is the amount of data required or is there a way to trim some fat here? 11MB shouldn't take 40 minutes to download on 3G regardless, so this seems like an issue with the app/DHIS2.

The second sync with the OU takes just as long, clocking in at over 40 minutes on a 3G connection as well (note that all 22 phones were trying to sync with the same OU, not sure this affects things as syncs to different OUs at different times appear to take just as long (though I haven't timed them recently)). The timing seems independent of whether the OU has events or not.

Testing sync time on 4G and WiFi seem to shave little (to any) time off of the sync (though I don't have official times for these, I can test and get these if necessary).

QAOs/POs/MDOs are reporting having to wait an hour plus at certain facilities in order for the app to fully resync if there is an issue which is too long, especially for MDOs who need to see 10+ providers per day in order to visit all providers in PSK's network each month. In addition, we often change the OU three times during half-day trainings (for pre-test, practice, and post-test) and the increasingly long sync times are preventing trainings from completing on time, especially if a sync fails and we need to resync the OU list and the OU which took approximately 80 minutes in the Kratie training on 3G.

Is there any way to speed up this sync going forward?

ifoche commented 8 years ago

@QISPSK I think we're hitting an issue that's mainly a "production" issue, not happening in dev environments where the amount of data is very reduced, the server usage is almost nothing, and so the syncs take place in only a couple of minutes.

Certainly, there are some strategies we could be able to address in order to speed up that synchronisation. In my opinion, this slow sync has to do with the big amount of data the app is trying to download (under the scenes all the devices are uploading/downloading data using the same user, and there's no way in the current DHIS2 sdk to select the download of only one OU, but it downloads for all OU the user has access to). And that's combined with the production server not being very responsive on several devices making requests at the same time. I've even seen long delays in simple requests to the production server using the web site, and the synchronization involves several calls to the server.

So each time the user is making the first sync, he's downloading all the data associated to KHMCS user (for all the OUs) and that can be huge in a production environment. After that, what the sencond sync is in reality doing is a translation of the data associated to the selected OU from the downloaded DB to the local DB. So the second one, happens only local...no server involved.

Apart from that, the synchronisation involves several calls that depends on the server response. If the server has 10 seconds of delay in order to process the request, each call is waiting 10s before the answer starts to come, and let's say we have 50 calls involved...so you would be waiting 500s just because of the server latency.

So first strategy to speed up could be server side, improving the capabilities of the server to reduce its latency. That automatically would have an impact, but obviously, that's not enough and I would say, that's not the biggest improve we can get.

The second and more important strategy would be to change the way we're submitting data, instead of all devices using the same user, move this to 1 user per device. That requires more administration but it drastically reduce the amount of data each device is downloading, and so, the sync improve would be huge. This implies a relatively simple modification app side, but I can imagine that server side, this modification would suppose to change a lot of things (@rodmelia to clarify the amount of work on this strategy).

The third possibility would be trying to modify the current SDK to limit the data download associated only to a selected OU and not for all of them. In that case, we could also generate a very big reduction of that download time. But, that would suppose to modify the Oslo SDK in one sense, while currently they have rewritten it completely. Actually we're now working on the evolution of the SDK to the latest one (still in beta version), so any work we would spend in the old SDK would only be valid for that version, no possible reuse, and would need to be spent again in the new one.

As long as this is an issue specially affecting the trainings and due to what I explained about the SDK evolution that's happening soon (in the next months). My recommendation would be not to spend time on improving the SDK now, but use a dev server for the trainings, with much less data (so the sync would happen in a couple of minutes) and wait for the next release to address this issue definitively with the new SDK. At the same time, I would start to plan the modifications needed server side, to allow next release to use one user per device strategy, instead of a common user for everyone.

Thoughts? What do you think?

QISPSK commented 8 years ago

Hi @ifoche, thanks for your detailed response! That definitely helps clarify what is going on behind the scenes of these syncs.

We're a bit busy this week with a regional meeting, but we will definitely test out the second option when we get the chance (likely early next week). Initial reactions to making users for every OU from MIS were quite negative given the large amount of work that this would entail, however, they recommend that perhaps we try to make users that are specific to a province and program (i.e., Mondulkiri PPM providers, Mondulkiri MMWs, etc.). We plan on testing the sync time between KHMCSPSK, a provincial user that we'll create, and an individual user to determine the amount of time that we can expect each option to save.

The first and third options both sound quite promising. I agree that since we will be shifting to a new version of DHIS2 soon, we should wait on option three until the new version is up and running. If option one can be implemented quickly and is DHIS2 version agnostic, we would appreciate if that could be implemented as well - every little bit helps!

In terms of using the dev server for trainings, that seems like a potential temporary fix, especially if our test with option two doesn't yield major time savings, however, do you know if there is a way to transfer data from the dev server to the production server? We want to make sure that the pre- and post-test data that we are collecting is stored on the production server for future reference.

Regardless, we will test out option 2 and will report back here with the results. Thanks again!

josemp10 commented 8 years ago

Hi guys,

I'm jumping in here...

About using dev server, first we need to be sure that the org. units are the same in both servers, if not we should move them from production to dev. How many org. units (and which ones) are you talking about? Then it would be possible to move events from one server to other, yes...(well the programs have to be sync in both servers).

@QISPSK https://github.com/QISPSK About option 2 (or its partial approach)... To me it will be a change that it would affect not only the MIS teams in Cambodia but also in the region ... I dont think Sam Ath, Paykeo ... will be able to handle that (it is always complicated for them to go at the provider level and very difficult to mantain), or maybe for some countries will be a bit easier, but for others don't. If option 1 and option 3 are promising, I believe we should go that way.... the performance improvement should be more than enough with those options...

On Wed, Oct 19, 2016 at 5:21 AM, QISPSK notifications@github.com wrote:

Hi @ifoche https://github.com/ifoche, thanks for your detailed response! That definitely helps clarify what is going on behind the scenes of these syncs.

We're a bit busy this week with a regional meeting, but we will definitely test out the second option when we get the chance (likely early next week). Initial reactions to making users for every OU from MIS were quite negative given the large amount of work that this would entail, however, they recommend that perhaps we try to make users that are specific to a province and program (i.e., Mondulkiri PPM providers, Mondulkiri MMWs, etc.). We plan on testing the sync time between KHMCSPSK, a provincial user that we'll create, and an individual user to determine the amount of time that we can expect each option to save.

The first and third options both sound quite promising. I agree that since we will be shifting to a new version of DHIS2 soon, we should wait on option three until the new version is up and running. If option one can be implemented quickly and is DHIS2 version agnostic, we would appreciate if that could be implemented as well - every little bit helps!

In terms of using the dev server for trainings, that seems like a potential temporary fix, especially if our test with option two doesn't yield major time savings, however, do you know if there is a way to transfer data from the dev server to the production server? We want to make sure that the pre- and post-test data that we are collecting is stored on the production server for future reference.

Regardless, we will test out option 2 and will report back here with the results. Thanks again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EyeSeeTea/SurveillanceCambodiaApp/issues/97#issuecomment-254700462, or mute the thread https://github.com/notifications/unsubscribe-auth/AD9HZwJs7simKQftrI0hypFqcEVOyBscks5q1Yy5gaJpZM4JwGeH .

rodmelia commented 8 years ago

Hi - the app was always conceived as not needing users. We clearly understand the heavy impact that managing hundreds of users will have locally, and it is an overhead that it isn't needed: that's the magical part of this app: no needing multiple logins. I don't see the business need of logins for keeping the functionality that we have or envision on the app (please refute my argument if you can identify a business need, rather than a technical need).

We simply have a problem with the SDK downloading unnecessary data - we are already using a forked SDK, so we just need to optimise the download to just the active OU in the app. This applies to KH & Laos. We need to do this for 2.22, as well as 2.25, as we prepara the upgrade early next year.

Igancio - can we work (in a separate conversation) in doing an estimation of the level of effort required to optimise the SDK 2.22/2.25. Thanks !

On 19 October 2016 at 05:21, QISPSK notifications@github.com wrote:

Hi @ifoche https://github.com/ifoche, thanks for your detailed response! That definitely helps clarify what is going on behind the scenes of these syncs.

We're a bit busy this week with a regional meeting, but we will definitely test out the second option when we get the chance (likely early next week). Initial reactions to making users for every OU from MIS were quite negative given the large amount of work that this would entail, however, they recommend that perhaps we try to make users that are specific to a province and program (i.e., Mondulkiri PPM providers, Mondulkiri MMWs, etc.). We plan on testing the sync time between KHMCSPSK, a provincial user that we'll create, and an individual user to determine the amount of time that we can expect each option to save.

The first and third options both sound quite promising. I agree that since we will be shifting to a new version of DHIS2 soon, we should wait on option three until the new version is up and running. If option one can be implemented quickly and is DHIS2 version agnostic, we would appreciate if that could be implemented as well - every little bit helps!

In terms of using the dev server for trainings, that seems like a potential temporary fix, especially if our test with option two doesn't yield major time savings, however, do you know if there is a way to transfer data from the dev server to the production server? We want to make sure that the pre- and post-test data that we are collecting is stored on the production server for future reference.

Regardless, we will test out option 2 and will report back here with the results. Thanks again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EyeSeeTea/SurveillanceCambodiaApp/issues/97#issuecomment-254700462, or mute the thread https://github.com/notifications/unsubscribe-auth/AQRv6hoY8KDQxcRzZZEXVyDzmaG1E4swks5q1Yy5gaJpZM4JwGeH .

ifoche commented 8 years ago

@QISPSK one quick derivative of the third strategy would be to give the user the possibility of selecting the amount of data (s)he wants to download from the Login screen. If we add that option keeping 6 months as the default value, maybe that could provide another solution to the trainings, and that much quicker to implement than the selective download.

@rodmelia sure! we can keep chatting in another thread to estimate this efforts. Just let me know when it's better for you.

Thanks!

QISPSK commented 8 years ago

@josemp10 and @rodmelia Thanks for jumping in guys - your input is always appreciated!

@ifoche The ability to choose the amount of data is an interesting one - definitely something we would be interested in if that was quick and easy to implement. It solves the issues in training, but it doesn't necessarily solve the longer term problems of MDOs/QAOs/POs/etc. having to spent too long at providers clinics if there was an error with their phone and they need to resync all 6 months.

Regardless, interested to hear what comes out of your and @rodmelia's conversations and hope that we can find an improvement soon. As mentioned before, the next training is November 3rd.

Thanks everyone!

rodmelia commented 8 years ago

Hi - I've added Graham & Chris via email. The recommendation that we want to put forward is:

1. Short term - add to the login screen a new 'periods' dropdown, to select the number of days/weeks/months to sync. By default this value will be 0, and we will offer 6 days, 6 weeks and 6 months (in line with monitoring). Under-the-hood there are no changes: the app still need to download 0 events (very fast), 6 days, 6 weeks, 6 months (very, very slow), as all events need to be pull). This is a simple fix, few hours of works, and addresses the majority of the use cases: training, or new providers (0 events, super-fast sync). For the few cases that you need to download previous cases, you can try to keep it to just 6 weeks (not that bad), but avoid 6 months (very, very slow).

2. Long Term - as part of the 2.25 upgrade, for which the SDK is being re-written, new functionality that passes the select Org Unit can be incorporated, so only events for that OU are pull. That will only become available in Q1 2017.

Do you find this approach appropriate?

On 24 October 2016 at 09:34, QISPSK notifications@github.com wrote:

@josemp10 https://github.com/josemp10 and @rodmelia https://github.com/rodmelia Thanks for jumping in guys - your input is always appreciated!

@ifoche https://github.com/ifoche The ability to choose the amount of data is an interesting one - definitely something we would be interested in if that was quick and easy to implement. It solves the issues in training, but it doesn't necessarily solve the longer term problems of MDOs/QAOs/POs/etc. having to spent too long at providers clinics if there was an error with their phone and they need to resync all 6 months.

Regardless, interested to hear what comes out of your and @rodmelia https://github.com/rodmelia's conversations and hope that we can find an improvement soon. As mentioned before, the next training is November 3rd.

Thanks everyone!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EyeSeeTea/SurveillanceCambodiaApp/issues/97#issuecomment-255668217, or mute the thread https://github.com/notifications/unsubscribe-auth/AQRv6oKwD_fl12ey8RWw1BDKTpDz-tQSks5q3F-IgaJpZM4JwGeH .

QISPSK commented 8 years ago

@rodmelia This sounds like a good approach to us! It'll mostly be a training issue for the moment, but it will be good to have the longer term solution in place in Q1 as we work to validate data and switch everyone away from paper records.

purdych1 commented 7 years ago

Hi Rodolfo,

I discussed this with Sarah, we’ll need a project code from the Cambodia team for this work to be completed since Cambodia has commissioned this request, and 3919CAMB for Malariacare (the previous code for MCS) is expired as of September 30.

Cambodia team – please respond here with the project code that this work will be charged to, then we can proceed.

Thank you, Chris

From: QISPSK [mailto:notifications@github.com] Sent: Wednesday, October 26, 2016 12:15 AM To: EyeSeeTea/SurveillanceCambodiaApp SurveillanceCambodiaApp@noreply.github.com Subject: Re: [EyeSeeTea/SurveillanceCambodiaApp] Slow DHIS2 Sync (#97)

@rodmeliahttps://github.com/rodmelia This sounds like a good approach to us! It'll mostly be a training issue for the moment, but it will be good to have the longer term solution in place in Q1 as we work to validate data and switch everyone away from paper records.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/EyeSeeTea/SurveillanceCambodiaApp/issues/97#issuecomment-256245312, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATSwWDbveX90-kwLt5FgDDVcoC0LPP59ks5q3tPfgaJpZM4JwGeH.

QISPSK commented 7 years ago

Hi @purdych1 , you can charge to 4262CA for this. Let me know if you need anything else!

ifoche commented 7 years ago

@QISPSK we've just generated a new version with the fix. Please let me know whenever you test it if you find any bug. Otherwise, please close this issue. Thanks!

QISPSK commented 7 years ago

@ifoche Thank you! We've been testing it out today and haven't found any bugs. The sync time reduction is huge and will make phone setup and testing significantly easier. Most of the team is in the field training the app to providers, however they will install and test the BB build during the training tomorrow and report back if they run into any major issues. I'll keep you in the loop and let you know if anything crops up.

ifoche commented 7 years ago

@QISPSK that sounds great. Thanks for reporting anything you could find.

QISPSK commented 7 years ago

@ifoche Sorry for the slow response on this! We've been on holiday for most of the past week and a half. We've done some testing and haven't run into any issues with the newest BB version and it significantly speeds up the initial setup steps which is great! I've gone ahead and moved this issue to "Done" and think you can go ahead and publish this to the Play Store. Looking forward to the DHIS2 2.25 upgrades and how they will speed up the sync for those already in the field.

QISPSK commented 7 years ago

Hi @ifoche, I'm not sure how long the duplicate data fix will take (sounds like it should be pretty quick), but we are currently setting up phones for a training next week and were curious if you could push the current BB version to the Play Store today/tomorrow. The sync drop down significantly reduces phone setup time and will ensure that the OU switching during the training goes off without a hitch. Let us know if you have any questions. Thanks!

ifoche commented 7 years ago

Hi @QISPSK. Ok, I'll push to Google play the current version and I'll try to generate the new bb version (with the fix for the duplicates) as soon as possible (eventually today, or maximum tomorrow if things get more complicated than expected at any point)

QISPSK commented 7 years ago

@ifoche Thanks for releasing the update! It crashes on my LG G3, but seems to be running just fine on the phones that matter - the LG Leon and LG K7.

ifoche commented 7 years ago

@QISPSK is your LG G3 Android version 6.0 and the others 4.x or 5.x? There's a problem with 6.0 permissions that we could fix in this new bb version, and that could definitively be the problem on your LG G3. There's a not obvious way to make it work manually giving those permissions in 6.0 (to avoid the crash) but we can implement it automatically in the code, so 6.0 would also be compatible without needing to do any manual trick. Can you confirm that's the case?