Path-Check / safeplaces-dct-app

COVID Safe Paths (based on Private Kit) is an open and privacy preserving system to use personal information to battle COVID
https://covidsafepaths.org
MIT License
466 stars 283 forks source link

OutOfMemory Error on devices under 2GB of RAM. #285

Open vitorpamplona opened 4 years ago

vitorpamplona commented 4 years ago

Not much to go on, but it might be the AsyncStorage with huge JSONs in memory.

java.lang.OutOfMemoryError: at java.lang.StringFactory.newStringFromBytes (StringFactory.java:178) at java.lang.StringFactory.newStringFromBytes (StringFactory.java:54) at java.lang.StringFactory.newStringFromBytes (StringFactory.java:46) at com.RNFetchBlob.RNFetchBlobFS.readFile (RNFetchBlobFS.java:225) at com.RNFetchBlob.RNFetchBlob$6.run (RNFetchBlob.java:227) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1162) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:636) at java.lang.Thread.run (Thread.java:764)

E3V3A commented 4 years ago

Good thing to check. Did you check the memory usage in AS?

Also, the app is already 19MB which is in the middle of below. Once, running, who knows what happens. Still interesting how some other apps can get by with 3-5MB and essentially already doing all it is supposed to do, check/track for contacts. Is this project over-engineered?


----------------------------------------------------------
Country/Region      AppName                 APK     Size (MB)
----------------------------------------------------------
Israel              Track Virus             1       12    
Bolivia             Coronavirus Bolivia     2       24    
S.Korea             Self-Diagnosis          3       5     
India/Punjab        COVA Punjab             4       6     
France/San Martín   CoronaISH               5       44    
Singapore           TraceTogether           6       5     
Thailand            NCOVI                   7       11    
Spain/Catalunia     STOP COVID19 CAT        8       3     
UK                  COVID Symptom Tracker   9       43    
Israel              "The Shield"/HaMagen    10      22    
India               CoWin-20                11      xx    
India               Corona Kavach           12      5     
India               COVID19 Feedback        13      47    
Germany             COVID-19                14      23    
----------------------------------------------------------

@vitorpamplona Also, why on earth would you store this JSON in memory? Are you sure? I would have expected an SQLlite DB...

vitorpamplona commented 4 years ago

Feel free to implement a version of the Storage that is not loading memory so much. The current version uses the commonly used react async storage which, if I am not mistaken, needs to load in memory.

E3V3A commented 4 years ago

Not familiar with this type of storage. Seem a bit weird to abstract away simple SQLite into something that is anyway using SQLite3. But I found this:

Current Async Storage's size is set to 6MB. Going over this limit causes database or disk is full error. This 6MB limit is a sane limit to protect the user from the app storing too much data in the database. This also protects the database from filling up the disk cache and becoming malformed (endTransaction() calls will throw an exception, not rollback, and leave the db malformed). You have to be aware of that risk when increasing the database size. We recommend to ensure that your app does not write more data to AsyncStorage than space is left on disk. Since AsyncStorage is based on SQLite on Android you also have to be aware of the SQLite limits.

Add a AsyncStorage_db_size_in_MB property to your android/gradle.properties:

AsyncStorage_db_size_in_MB=10
vitorpamplona commented 4 years ago

It's not weird. It's just way simpler than SQLite.

I think what you found refers to the disk space, not memory. Not sure what to do there.

E3V3A commented 4 years ago

It's not weird. It's just way simpler than SQLite.

Let's just agree to disagree on that. I've had these SQL discussions for almost a decade, and yet, all DBs are still using SQL. Why? Simplicity and clarity.

I think what you found refers to the disk space, not memory. Not sure what to do there.

LoL. Like I said above. If people actually bothered to understand SQL and SQLite3 in particular, you will quickly find that SQLite3 also has memory only implementation and the only relevant limitation is that what was put there by the abstraction layer. I believe that the new RockDB or whatever else they had, was to improve performance for non SSD devices.

Anyway, why not just try with the change proposed above?

vitorpamplona commented 4 years ago

Well, try and see if you can replicate the bug with and without the tag. I still don't understand how a lack of memory can be solved by allowing bigger memory limits.

Btw, I have 25 years of SQL experience. In fact, I coded the business logic of an entire ERP in PLSQL, where stored procedures had 5k+ lines and selects with 34joins and sub-sub-sub queries were common. So, relax. I know what SQL is.

E3V3A commented 4 years ago

@vitorpamplona

So, relax. I know what SQL is.

Yes, I am sure you know your stuff. Please don't take my comments personal. But asking me to replicate something that you experienced, without any info at all how you got that message, is obviously asking the impossible. (From above) How do you even know it came from this app??

Anyway, it's in cases like this, it may be a good idea to implement a simple "debug screen" that the user can copy, to save elsewhere. That screen would contain some limited logcat messages coming from this app.

vitorpamplona commented 4 years ago

This is not something I experienced, unfortunately. This comes from the PlayStore Console. The stack is everything it gives us. And happens every day on such devices. So, it seems to be a very common problem (easy to replicate on an emulator, where you can reduce OS memory until it breaks). It's a good practice to first replicate and find where and how it happens before suggesting solutions. And, of course, we can't merge anything without evidence that actually solves the problem.

kenpugsley commented 4 years ago

@vitorpamplona - do the logs give us anything more specific on the particular device? Make/model, etc? Wondering how we replicate to verify the fix.

vitorpamplona commented 4 years ago

The list is quite big. Most of the budget phones are there. Since this is an OutOfMemory, it can happen on all phones, we just need to load enough info on the memory to trigger similar errors.

Here are the top ones:

Name Count %
Pixel 3a (sargo) 8 7.8%
Pixel 3a XL (bonito) 6 5.8%
moto g(7) power (ocean) 6 5.8%
Pixel 3 (blueline) 5 4.9%
LG K10 (2017) (mlv5) 5 4.9%
Galaxy A5(2016) (a5xelte) 4 3.9%
vitorpamplona commented 4 years ago

People affected in the last 7 days. Blue are crash reports, red are impacted users.

Screen Shot 2020-04-02 at 11 24 46 AM

E3V3A commented 4 years ago

Any of the following should provide you with enough info to track down the problem. But I suggest to disable all other apps and related background services before using:.

troach-sf commented 4 years ago

The issue is in Overlap.js where you pull the public csv dataset from here: https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.csv

This is over 50MB and is too much to be loading straight into memory. This needs to be buffered and stored somewhere without loading the entire csv into memory. This dataset is only going to get larger.

E3V3A commented 4 years ago

For ref: Overlap.js

Unless you have a smart way to parse CSV on-the-fly, I suggest loading it all into a DB and selecting only the data relevant for our location history.

BTW. Where is our location history stored?

Alternatively, not use that particular CSV at all, and instead ask maintainers to split file according to regions or something.


PS & Heads up!

Here's another nice example line, only 900 chars long:

000-1-100,65,female,Shenzhen City,Guangdong,China,22.65389,114.1291,admin2,03.01.2020,10.01.2020,21.01.2020,"cough, fever, weakness",no,29.12.2019 - 04.01.2020,Wuhan,no,"Among the 53 cases diagnosed in our province, 28 were males and 25 were females, aged between 10-81 years old. Among them, 45 cases had a history of living or traveling in Hubei before the onset, and 8 cases had no Hubei before the onset of the disease. Resident history or travel history, 6 of which are imported family members of Hubei, 1 with dinner with friends in Wuhan, 1 without obvious contact history, epidemiological investigations are being carried out. A total of 10 cluster epidemics were found, all of which were family clusters, involving 27 cases. There were no cases of medical staff infection.",,,https://www.thelancet.com/pb-assets/Lancet/pdfs/S0140673620301549.pdf,,,,,,,Shenzhen City,Guangdong,China,440300.0,,
E3V3A commented 4 years ago

Let's see:

So someone need to be able to store at least :
4 x 900 x 100,000,000 = 3.6^11 [bytes] = 327 TB

Clearly we can't do anything of what was just suggested, without being able to first selectively chose the data to be downloaded. We need to use some kind of tile method like most GPS map apps are using to download only data from tiles where user has been.

Patrick-Erichsen commented 4 years ago

@tstirrat Think we can close this one? I believe the work is in-progress on Jira to improve the overall app performance by reducing payload sizes from HAs.

mikeflores2000 commented 4 years ago

Let's see:

  • each UTF-8 char is 4 bytes
  • each line can be 900+ chars
  • we may have ~100 M cases by the end of next month (May).

So someone need to be able to store at least : 4 x 900 x 100,000,000 = 3.6^11 [bytes] = 327 TB

Clearly we can't do anything of what was just suggested, without being able to first selectively chose the data to be downloaded. We need to use some kind of tile method like most GPS map apps are using to download only data from tiles where user has been.

On or after my joining Safe Path on April 25, 2020, I brought up this same bottle neck. Where is data stored for the terra bytes involved? A NoSQL database that can store highly structured columnar data, key value, etc. is best long term and deploy a master contact tracing app that can scale up across state lines and countries.