dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.43k stars 315 forks source link

Unable to extract satisfactory equations from experimental data (The equations are underfitted) #329

Closed RzaRamezanii closed 1 year ago

RzaRamezanii commented 1 year ago

Hi there,

I hope this message finds you well. I am writing to inform you that I am working on a thesis related to a turbulent shear layer and have obtained 2D LDA-derived time-series data for this purpose. My research is centered on discovering governing equations using the SINDy algorithm. However, I have encountered a challenge as I have been unable to obtain a satisfactory answer with SINDy despite over a year of attempts.

The original data set comprises 5000 samples with four key features: 1) U_vel [m/s] 2) U_timestamps [ms] 3) V_vel [m/s] and 4) V_timestamps [ms].

image

Notably, the U & V velocity data is not recorded simultaneously as there is a time gap between them. However, I have successfully addressed this issue using preprocessing techniques such as Resampling and Imputation. By applying the Pivot concept in Python, I combined the U_vel and V_vel features into a single timestamp column feature. However, SINDy's results were undesirable, with the given equations extremely underfitting. I have tried varying hyperparameters, optimizers, etc to address this issue, but nothing has worked.

PRE-PROCESSING:


image

image


image

image

image

image


image

image


image

image

PROCESSING:

image

image

image

I have included all the necessary information related to my work and attached the Original Data Excel File for your understanding. I would appreciate your assistance in overcoming the processing challenges. Best regards, Reza Ramezani 1230270 5 Heshmat.xlsx

Jacob-Stevens-Haas commented 1 year ago

Hey Reza, a few things before we look into this:

  1. post your code, formatted for GH markdown, preferably with syntax highlighting, instead of screenshots of a jupyter notebook. See here
  2. Don't post your data as a file I need to download from the internet. post it as a numpy array inside a spoiler so that we can open the spoiler, copy it into np.array(<pasted data>), then minimize the spoiler to get back to the conversation. See here
  3. Be more specific about what you're asking. "Unsatisfactory results" isn't enough. What would you consider a satisfactory result? "A simulation of the discovered model diverges rapidly after two timesteps"
  4. Post "office hours" kinds of questions as a discussion, not an issue. SINDy is a research method, and there are data sets, measurement types, and dynamics for which it performs worse. If you're asking the community for help on your research, it's better to keep that kind of ask outside of the issues.
RzaRamezanii commented 1 year ago

Dear Jacob,

Thank you very much for your quick response, kindness, and key notes. I benefited from and learned your invaluable points about the appropriate procedure of posting process on Github.

I will repost my questions in a way you recommended tomorrow.

Best regards, Reza

On Wed, May 3, 2023, 22:02 Jacob Stevens-Haas @.***> wrote:

Hey Reza, a few things before we look into this:

  1. post your code, formatted for GH markdown, preferably with syntax highlighting, instead of screenshots of a jupyter notebook. See here https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks
  2. Don't post your data as a file I need to download from the internet. post it as a numpy array inside a spoiler so that we can open the spoiler, copy it into np.array(), then minimize the spoiler to get back to the conversation
  3. Be more specific about what you're asking. "Unsatisfactory results" isn't enough. What would you consider a satisfactory result? "A simulation of the discovered model diverges rapidly after two timesteps"
  4. Post "office hours" kinds of questions as a discussion, not an issue. SINDy is a research method, and there are data sets, measurement types, and dynamics for which it performs worse. If you're asking the community for help on your research, it's better to keep that kind of ask outside of the issues.

— Reply to this email directly, view it on GitHub https://github.com/dynamicslab/pysindy/issues/329#issuecomment-1533515573, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4CNSK4RY7JRQLS7QOQH4PDXEKQDBANCNFSM6AAAAAAXUXDRXQ . You are receiving this because you authored the thread.Message ID: @.***>

RzaRamezanii commented 1 year ago

Dear Jacob,

I have rewritten my question as a discussion, not an issue.
I am eagerly waiting for your response and asking the SINDy community for help with my research.