Evaluation give different score from the paper

I would like to thank you for creating such a great work.

I'm currently working on light estimation and want to compare it with the Stylelight.

However, when I ran your code on the Laval indoor dataset, the score is quite different. So, I would like to ask if this is still acceptable. If not, I am looking for the correct way to run an evaluation.

	Report in paper: M (Mirror ball)	Run by myself: M (Mirror ball)
Angular Error	4.30	6.74
RMSE	0.56	0.58
si-RMSE	0.55	0.56

	Report in paper: S (Silver matte ball)	Run by myself: S (Silver matte)
Angular Error	2.96	4.61
RMSE	0.30	0.33
si-RMSE	0.29	0.32

	Report in paper: D (Diffuse ball)	Run by myself: D (Diffuse ball)
Angular Error	2.41	4.07
RMSE	0.15	0.16
si-RMSE	0.11	0.13

Let me specify how I got this score so you can point out which step I did wrong.

Prepare the cropped Laval indoor dataset by running data_prepare_laval.py
changing input path by pointing root_path in test_lighting.py to the corrected path
Tonemap the ground truth in the directory name test using evaluation/tonemap.py
Tonemap the output of Stylelight using evaluation/tonemap.py
Render ground truth into 3 balls using evaluation/test_render.sh
Render the output of Stylelight into 3 balls using evaluation/test_render.sh
Compute the score by run evaluation/test_rmse.sh --fake <stylelight's tone-mapped dir> --real <ground truth's tone-mapped dir>

Best regard.

Wanggcong / StyleLight

Evaluation give different score from the paper #9