ashokkrish / spatialEpisim

spatialEpisim: Spatial Tracking of Infectious Disease Epidemics using Mathematical Models
GNU General Public License v3.0
8 stars 5 forks source link

Seed data observations outside the simulation spatial domain cause extract operator error #5

Closed bryce-carson closed 2 weeks ago

bryce-carson commented 2 months ago

While attempting to replicate the issue in #4 as described by Ashok in a personal communication, I encountered the following error (which causes the usual busy client UX).

 grc_ppp_2020_1km_Aggregated_UNadj
 Min.   :    0.0                  
 1st Qu.:    2.5                  
 Median :    5.7                  
 Mean   :   49.9                  
 3rd Qu.:   15.7                  
 Max.   :18873.2                  
 NA's   :820316                   
[1] 820316
Warning: Error in h: error in evaluating the argument 'i' in selecting a method for function '[': NA/NaN argument
  1: runApp
bryce-carson commented 2 months ago

The issue is not present if the province of Crete is not cropped from the data when running a default SVEIRD simulation.

bryce-carson commented 2 months ago
 grc_ppp_2020_1km_Aggregated_UNadj
 Min.   :    0.0                  
 1st Qu.:    2.5                  
 Median :    5.7                  
 Mean   :   49.9                  
 3rd Qu.:   15.7                  
 Max.   :18873.2                  
 NA's   :820316                   
[1] 820316
[1] "Susceptible Count before removing initial seed values:  9033978.84338977"
[1] "Susceptible Count after removing initial seed values:  9033978.84338977"
[1] "time =  1"
[1] "time =  2"
[1] "time =  3"
[1] "time =  4"
[1] "time =  5"
[1] "time =  6"
[1] "time =  7"
[1] "time =  8"
[1] "time =  9"
[1] "time =  10"
Output #0, mp4, to 'Infected_MP4.mp4':
  Metadata:
    encoder         : Lavf60.16.100
  Stream #0:0: Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 768x768, q=2-31, 200 kb/s, 90k tbn
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
Adding frame 11 at timestamp 15.00sec (90%) - video stream completed!
[1] "isCropped" "TRUE"     
Warning: Error in h: error in evaluating the argument 'i' in selecting a method for function '[': NA/NaN argument
  1: runApp
bryce-carson commented 2 months ago

The same issue occurs with the simulation options in #8, but cropping the default Congolese provinces.

bryce-carson commented 2 months ago

For all of Greece could set the rasterAgg to a small number (say 5, 10) and re-run?

I will try that; determining if the aggregation factor is related may be important.

bryce-carson commented 2 months ago

For all of Greece could set the rasterAgg to a small number (say 5, 10) and re-run?

The issue occurs with that aggregation factor as well.

bryce-carson commented 2 months ago

Diagnosis

Cropping the province containing the only seed data is the likely source of the issue.

All the following locations are on the island of Crete:

Location lat lon InitialVaccinated InitialExposed InitialInfections InitialRecovered InitialDead
Chania 35.517 24.017 1000 200 100 10 10
Heraklion 35.34 24.134 1000 200 100 10 10
Rethymno 35.369 24.474 1000 200 100 10 10
Ierapetra 35.012 25.741 1000 200 100 10 10
Agios Nikolaos 35.183 25.712 1000 200 100 10 10
Sitia 35.2 26.1 1000 200 100 10 10

Discussion

@ashokkrish, in your email asking for me to prioritize this issue this week, you said

I ran an SVEIRD model simulation for 50 days in the island of Crete in Greece using the above seed data.

You wrote as if you intend for the simulation to be limited to the island of Crete, when what is happening when we crop this island is the simulation is running everywhere in Greece but Crete. The seed data is limited to Crete, so there is nothing for the modelling process to work with when Crete is cropped from the data.

Related issues

Why the issue #4 occurs when the simulation is set to Greece will be determined in #4.

Solution

The simulation can reject the seed data if there is any data point in it which occurs in a cropped province or state. An alternative is to crop such observations from the seed data, and also ensuring that the final seed data has a final length of at least one.

## Ensure there is at least one observation in the seed data.
stopifnot(length(filteredSeedData))
bryce-carson commented 2 months ago

I don't know why @ashokkrish was able to run the simulation, cropping the island of Crete, in #4. Perhaps he used different seed data?

bryce-carson commented 2 months ago

I can replicate this issue when using this seed data COD_InitialSeedData_Beni.csv.

Other options:

bryce-carson commented 2 months ago

Without cropping the map / limiting the simulation to Nord Kivu—i.e. isCropped = FALSE is logged—the simulation will run with the same initial seed data.

bryce-carson commented 2 months ago

Without cropping, the leaflet plot is automatically reducing the zoomed portion to Ituri and Nord Kivu. What are the defaults in the application? Is it automatically cropping the simulation to Ituri and Nord Kivu?

Screen Shot 2024-06-11 at 21 43 27 Screen Shot 2024-06-11 at 21 43 17

ashokkrish commented 2 months ago

Without cropping the map / limiting the simulation to Nord Kivu—i.e. isCropped = FALSE is logged—the simulation will run with the same initial seed data.

Yes it would run as long as a seed data is uploaded. The seed data may be for the whole country or selected set of cities.

What's problematic is if the data is seeded outside the spatial domain. I assume the simulation runs even if you select only one province however the seed data is for two provinces. Correct?

ashokkrish commented 2 months ago

Without cropping, the leaflet plot is automatically reducing the zoomed portion to Ituri and Nord Kivu. What are the defaults in the application? Is it automatically cropping the simulation to Ituri and Nord Kivu?

Without cropping it runs for the whole country for me.

ashokkrish commented 2 months ago

I don't know why @ashokkrish was able to run the simulation, cropping the island of Crete, in #4. Perhaps he used different seed data?

I used GRC_Crete_InitialSeedData.csv and I made sure I cropped Crete on the left panel.

bryce-carson commented 2 months ago

Without cropping the map / limiting the simulation to Nord Kivu—i.e. isCropped = FALSE is logged—the simulation will run with the same initial seed data.

Yes it would run as long as a seed data is uploaded. The seed data may be for the whole country or selected set of cities.

What's problematic is if the data is seeded outside the spatial domain.

I encounter this issue's error when I run a simulation of Greece limited to Athos with the Crete data. Perhaps the issue is caused by the presence of data outside of the domain. More testing will be needed, because this is a difficult to locate error; maybe I'll have better luck running the application in showcase mode to see exactly what is running before the error is encountered; it happens very quickly, however. I'll check in on Shiny debugging methodology first.

bryce-carson commented 2 months ago

I said,

I can replicate this issue when using this seed data _COD_InitialSeedDataBeni.csv.

Other options:

* SVEIRD

* Crop map to / limit simulation to _Nord Kivu_ (North Kivu)

You replied to another thing I said,

Without cropping the map / limiting the simulation to Nord Kivu—i.e. isCropped = FALSE is logged—the simulation will run with the same initial seed data.

Yes it would run as long as a seed data is uploaded. The seed data may be for the whole country or selected set of cities.

What's problematic is if the data is seeded outside the spatial domain. I assume the simulation runs even if you select only one province however the seed data is for two provinces. Correct?

It took me about fifteen minutes to definitively determine where the Manima housing development is, but I saw on the map that it's next to Mambasa and that's definitely within the Ituri province, so the seed data does cover two provinces.

Running a simulation with the Beni seed data and limiting the simulation to either Ituri or Nord Kivu will cause this error, but if I include both provinces the simulation will run.

Conclusion

The cause of this issue, determined through scientific deduction 🧑‍🔬 🧪, is the presence of seed data outside of the simulation domain. Any seed data outside of the simulation domain seems to be an issue. The solution is, then, as I said earlier but with the assumption that cropping worked in the opposite direction to how it does work in the application, to ensure that no point of seed data exists outside of the polygons.

That won't be too difficult because the vector data of a coordinate and a polygon can definitely be quickly checked. We can ensure that all points of seed data exist within some polygon the simulation is including, and if not prevent the simulation from running.

ashokkrish commented 2 months ago

The solution is, then, as I said earlier but with the assumption that cropping worked in the opposite direction to how it does work in the application, to ensure that no point of seed data exists outside of the polygons.

That won't be too difficult because the vector data of a coordinate and a polygon can definitely be quickly checked. We can ensure that all points of seed data exist within some polygon the simulation is including, and if not prevent the simulation from running.

I have wanted this kind of validation for a long time now (almost two years) but told to myself hopefully the user uploads the propre seed data. Yes, we will address in the coming updates. Just wanted to caution you about not getting too deep into this as I would like a stable version by this Friday to get my slides going.

bryce-carson commented 2 months ago

The solution is, then, as I said earlier but with the assumption that cropping worked in the opposite direction to how it does work in the application, to ensure that no point of seed data exists outside of the polygons. That won't be too difficult because the vector data of a coordinate and a polygon can definitely be quickly checked. We can ensure that all points of seed data exist within some polygon the simulation is including, and if not prevent the simulation from running.

I have wanted this kind of validation for a long time now (almost two years) but told to myself hopefully the user uploads the propre seed data. Yes, we will address in the coming updates. Just wanted to caution you about not getting too deep into this as I would like a stable version by this Friday to get my slides going.

Indeed. I'll work on the feature in the unstable branch that I'll create. The stable branch (main) has now received the fixes for the other issues we discussed yesterday.

These are the items on my checklist I wrote yesterday during our meeting which I have not yet worked on:

It's possible to refactor the UI code and make some edits to the server without upsetting the current logic (which would also resolve #18), but I'll leave that to the unstable branch just so that not too much changes in the code and your familiarity with it isn't undermined.

bryce-carson commented 1 month ago

Here are resources I'll be using to resolve this issue, which is in the point in polygon domain of spatial data analysis.

StackOverflow user Robert Hijmans provides four examples using the terra library.

ashokkrish commented 1 month ago

This is excellent. Please go ahead.

bryce-carson commented 1 month ago

@kle6951 and @Toby-exe, this issue can be resolved in stable as well. Please investigate the solution I wrote about in one of the last comments.

I suggest working with the solution outside of the context of our Shiny app until you're comfortable with it. Then write a wrapping function to take a set of coordinates and a country code (countrycode::codelist$iso3c) and return a logical vector.

That way it can be used in a pipeline like the following

  1. User uploads seed data with coordinate pairs
  2. valid <- all(coordinatePairsInsideCountry(uploadedData$coordinates, iso3C()))
Toby-exe commented 4 weeks ago

@ashokkrish please see the latest commit (61564e2ad12f56b29bffd3fd48e957f66c2276d5) for progress on this issue

Toby-exe commented 2 weeks ago

I believe this issue can be closed as of 321ef2b0b465a81f3d265d1cb5f29e1cad400939