AleksandarSekulic / RFSI

Random Forest Spatial Interpolation
44 stars 12 forks source link

RFSI prediction maps #2

Open armitakar opened 2 years ago

armitakar commented 2 years ago

Hi, I am preparing an RFSI model for predicting PM2.5 using some spatio-temporal (temperature, humidity) and some landuse (vegetation, density etc) variables. The output variable is the PM2.5 value at a specific hour (e.g., 8am) of a specific day. I am using hourly average PM2.5 data from July 21 to Apr 22. I only have 31 sensors for the Columbus metro area. The sensors generating PM2.5 data are not uniformly distributed across the stud area; rather, it is more concentrated within the center.

I also performed CV for the models, and the accuracy measures are pretty good. The issue is that the prediction map produces some reference lines that kind of seems odd. I am not sure what is causing this. I suspect that having a small number of sensors for making predictions of a large-scale area might be the issue here. The reference lines may represent the distance variables measured from the nearest neighbors. I am curious to know if anyone else has encountered the same issues and has some suggestions for me.

I also performed spatio-temporal kriging using the same dataset, and the map looks much better there. Red dots are the sensor locations.

RFSI image

STRK image

Thank you!

AleksandarSekulic commented 1 year ago

Hi, You are right. On of the reasons is small number of stations. The other, and the main one, is the nature of RFSI, i.e. RF. We also encounter the problem with "artifacts" in one of our paper (https://www.nature.com/articles/s41597-021-00901-2). This is the problem that the nearest observations are mostly the mos important covariates in RFSI model. We partly solved this problem by adding the use.idw and idw.p parameters - this way the IDW prediction (of course without using the observed location for model calibration) is used as covariate, you can try it. We are still trying to find a solution for this. One other reason for the artifacts would be that you made predictions outside of the spatial domain of your stations. This is somehow spatial extrapolation which is very known to be a problem for RF-based ML methods.

I'll let you know if find a solution to this problem.