Performance Metrics - Githubissues

Hi,

Thanks for your message. I'm really happy that people are trying my methodology for their own project. =)

So, I used SRMSE (and other metrics) because it's the usual tests used in the transportation literature. My goal was just to add a more systematic/robust way of using these metrics. But I'm not a fan of these. For example, what does a SRMSE of 0.1 means compared to a value of 1. I have no idea. So, it's quite tough to compare. That's why, in the end, I ranked the models based on the results without comparing the difference in values.

In your case, it seems that you have a well defined application for this methodology. Thus, I think you would benefit from using a metric that reflects this application. We had a long discussion about synthetic data assessment during my PhD defense and the conclusion was that we should use application-based metrics. For example, I worked with a student to develop a way to assess synthetic data when they are used to augment a dataset.

The idea is to check how adding synthetic data to real data would affect the performance of ML accuracy. So, we chose a ML method (in this case logistic regression is a bad choice since it does not necessarily need more data to perform better). Then, we trained the model using only real data. Then, we augmented the dataset such that it contains 10% of synthetic data, then 20%, etc. until we reach 50% of synthetic data and 50% of real data. Once you have this, you can see how the accuracy evolves with the proportion of synthetic data. An example of a result is shown below:

We also did a similar test where we did not augment the dataset but simply replaced the real data with synthetic data as shown below:

Using these two graphs, the conclusion we got where:

The model in blue generates over-simplified synthetic data. This leads to better accuracy in both cases. (It shouldn't in the second case)
The model in orange generates "bad" synthetic data because the accuracy drops a lot in the second case.
The best model is the green one since it is quite stable in both cases.

The DATGAN in these two graphs were very early versions of this current model. I haven't run the same experiment with the final model. But in the end, it shows that in the context of data augmentation, you can have some visual representation of your results without using SRMSE. Ofc, this was an early work and would require to be much more robust and better defined (use a different ML algorithm, test multiple variables, etc.)

So, in the end, my message for you is to come up with your own assessment method that reflects the application for which you're using synthetic data. The assessment needs to be in a controlled environment but it should be "similar" to the final application.

I hope that what I tried to explain is clear and that it might give you some ideas for your case. =) Don't hesitate if you have more questions. We could even have a zoom meeting to discuss it in more details. =)

glederrey / DATGAN

Performance Metrics #2