Is there someone use this project achieve good accuracy on your own data?

xinlnix commented 1 year ago

I have tried some data using this project. But the accurate is very low and can not be used in the real. Is there someone achieve good accuracy on your own data?

WassimTenachi commented 1 year ago

Hi @xinlnix, I'm sorry that you are disappointed with symbolic regression. Please keep in mind that SR with neural networks is still in its infancy !

However, even on vey complicated and noisy data, physo usually achieves to produce at least one symbolic model with R2 > 0.9. Did you setup your physical units properly ? It could be that physo is prevented from formulating a good model because of a bad units configuration.

I suggest disabling units constraints by leaving the units field empty. Can you share your code and/or your data ?

Cheers. Wassim

xinlnix commented 1 year ago

Thanks a lot for your relpy. Here is the link of the csv file. The inputs if vg and vd. The output is ids in the csv file. The unit of vg and vd is V. The unit of ids is A.

xinlnix commented 1 year ago

The update data link is : https://drive.google.com/file/d/1HbedKbnGL9C_lKaZBpN1ussg2W_xmMzC/view?usp=sharing

xinlnix commented 1 year ago

@WassimTenachi Thanks

WassimTenachi commented 1 year ago

Hi @xinlnix,

I noticed that your output "ids" contains extremely low values close to the machine epsilon at a very different scale than "vg" and "vd" making free constant optimisation very difficult. You should really scale ids these values such that $ids \in [0, 1]$

I would suggest running something along the line of:

expression, logs = physo.SR(X,  y,
                            X_names = [  "vg"      ,  "vd"   ],
                            X_units    = [ [1, 0]      ,  [1,  0] ],
                            y_name  = "ids",
                            y_units = [0, 1],
                            fixed_consts       = [ 1.      ],
                            fixed_consts_units = [ [0,0,0] ],
                            free_consts_names = [  "v0"     , "v1"   , "i0"     ],
                            free_consts_units    = [  [1, 0]   , [1, 0]  , [0, 1]   ],
                            op_names = ["mul", "add", "sub", "div", "inv", "n2", "sqrt", "neg", "exp", "log", "sin", "cos"]
)

With a unit system [voltage, ampere] and 3 free constants (having units of input variables and output).

This should give you the best chances of resolving your modelling problem.

Please keep us updated on the results !

Cheers. Wassim

xinlnix commented 1 year ago

I tried the config and apply log on the ids. But the result is also not ideal. Here is the run reults. https://drive.google.com/drive/folders/10_WPO3msfw3saEEGJXaDezJ0d3GVHr7c?usp=sharing

WassimTenachi commented 1 year ago

Hi @xinlnix ,

Thanks for sharing your results ! I may be missing something but it sounds like results are very decent no ?

From what I see, physo has converged to this expression $f(vg,vd) = \frac{i{0} \left(v{0} e^{\frac{v{1}}{v{0} + vd + 10 vg}} + vg\right)}{v_{0}}$ with a fit coef of $R^2 = 0.97$ which is quite good !

The learning curves look normal and I have inspected the fit in a 3D plot:

It seems to fit pretty good except maybe at very low vd values (< 0.05) where your y points are lower than the values predicted by f, but this is probably due to the lack of data as only 8% of your data is in this feature going down as vd goes down.

If you want to perfect your fit, I would suggest duplicating low vd values so it has more weight in the reward.

WassimTenachi commented 1 year ago

If you want to check fit quality for yourself here is the code I used by the way:

import sympy
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import r2_score

##
res_equation = sympy.simplify("(-(((-(vg)/v0)+-(exp((v1/(((((((((((v0+vg)+vg)+vd)+vg)+vg)+vg)+vg)+vg)+vg)+vg)+vg))))))*i0)")

print(res_equation)
print(sympy.printing.latex(res_equation))

def res_func(vg,vd):
    i0 = -2.7882420371810275
    v0 = 3.9984434760528837
    v1 = 4.141095260217377
    y = i0*(v0*np.exp(v1/(v0 + vd + 10*vg)) + vg)/v0
    return y

##
# Calculate the R2 score

df = pd.read_csv("data.csv")
y_target = np.log10(df["ids"])
r2_score_result = r2_score(y_target[np.isfinite(y_target)], res_func(df["vg"], df["vd"])[np.isfinite(y_target)])
print("R2 score:", r2_score_result)

##
# Create a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Plot the 3D scatter plot
ax.scatter(df["vg"], df["vd"], res_func(df["vg"], df["vd"]), c='r', marker='.', alpha=0.1, label = "physo expr.")
ax.scatter(df["vg"], df["vd"], np.log10(df["ids"]),          c='k', marker='.', alpha=0.1, label = "target")

# Set labels for the axes
ax.set_xlabel('vg')
ax.set_ylabel('vd')
ax.set_zlabel('ids')
ax.legend()
plt.show()

WassimTenachi commented 1 year ago

Hi @xinlnix,

I'm closing this issue now but don't hesitate to re-open it if you need further help.

xinlnix commented 1 year ago

I appreciate your response and the effort you've put in. While PhySO has managed to achieve an R2 value of 0.97, unfortunately, it falls short of meeting my specific requirements. In contrast, utilizing a Neural Network has enabled me to reach an impressive R2 value of 0.9999. However, the drawback I've encountered is the Neural Network too slow.

Once again, I want to express my gratitude for your patient and considerate response.

WassimTenachi commented 1 year ago

Dear @xinlnix,

Yes symbolic regression is not meant to outperform neural networks in fit quality, neural networks will always have better performances and are much easier to train since they are typically much more flexible.

The advantages of finding an equation are:

better generalisation capabilities
interpretability & intelligibility
faster execution compared to a neural net

Unfortunately this does not include fit quality (at least not on the training range).

Take care. Wassim

WassimTenachi / PhySO

Is there someone use this project achieve good accuracy on your own data? #35