cmu-phil / py-tetrad

Makes algorithms/code in Tetrad available in Python via JPype
MIT License
62 stars 12 forks source link

About Singularity matrix #12

Closed Joanna-0421 closed 10 months ago

Joanna-0421 commented 1 year ago

When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much.

jdramsey commented 1 year ago

Could you describe your data?

On Wed, Oct 25, 2023 at 3:53 AM zhangziyuan @.***> wrote:

When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/py-tetrad/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Joanna-0421 commented 1 year ago

Could you describe your data? On Wed, Oct 25, 2023 at 3:53 AM zhangziyuan @.> wrote: When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much. — Reply to this email directly, view it on GitHub <#12>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.>

The data is an open source multivariate time series dataset called SMAP. Each time serie contains 55 features. More details about this dataset can be found https://www.kaggle.com/datasets/patrickfleith/nasa-anomaly-detection-dataset-smap-msl. I use all the train data to learn the causal graph, using FGES. Thank you very much.

jdramsey commented 1 year ago

Oh, cool, a rover dataset! A long time ago, I worked on some data for an earlier rover. This is going to require some thought, though, so give me a few days. I'll get to it.

The basic problem with singularities is just that--some variables are linear combinations (exactly) of other variables. If you want to use a score that inverts correlation matrices for submatrices, you need to remove some variables from the data since that will throw singularity exceptions. But until I look at the data, I can't give you much useful advice.

You could treat the data all as discrete, which should, in principle, work if the sample size is big enough. That is, add a discretization step.

I've thought of doing a Linear Gaussian BIC score that uses generalized matrix inversion but don't think it will be available in the near future.

On Wed, Oct 25, 2023 at 9:29 AM zhangziyuan @.***> wrote:

Could you describe your data? … <#m3357814697248003521> On Wed, Oct 25, 2023 at 3:53 AM zhangziyuan @.> wrote: When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much. — Reply to this email directly, view it on GitHub <#12 https://github.com/cmu-phil/py-tetrad/issues/12>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.>

The data is an open source multivariate time series dataset called SMAP. Each time serie contains 55 features. More details about this dataset can be found https://www.kaggle.com/datasets/patrickfleith/nasa-anomaly-detection-dataset-smap-msl . I use all the train data to learn the causal graph, using FGES. Thank you very much.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/py-tetrad/issues/12#issuecomment-1779280241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRZTHBCM6U6M5BKFRNTYBEH3JAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGI4DAMRUGE . You are receiving this because you commented.Message ID: @.***>

Joanna-0421 commented 1 year ago

Oh, cool, a rover dataset! A long time ago, I worked on some data for an earlier rover. This is going to require some thought, though, so give me a few days. I'll get to it. The basic problem with singularities is just that--some variables are linear combinations (exactly) of other variables. If you want to use a score that inverts correlation matrices for submatrices, you need to remove some variables from the data since that will throw singularity exceptions. But until I look at the data, I can't give you much useful advice. You could treat the data all as discrete, which should, in principle, work if the sample size is big enough. That is, add a discretization step. I've thought of doing a Linear Gaussian BIC score that uses generalized matrix inversion but don't think it will be available in the near future. On Wed, Oct 25, 2023 at 9:29 AM zhangziyuan @.> wrote: Could you describe your data? … <#m3357814697248003521> On Wed, Oct 25, 2023 at 3:53 AM zhangziyuan @.> wrote: When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much. — Reply to this email directly, view it on GitHub <#12 <#12>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.> The data is an open source multivariate time series dataset called SMAP. Each time serie contains 55 features. More details about this dataset can be found https://www.kaggle.com/datasets/patrickfleith/nasa-anomaly-detection-dataset-smap-msl . I use all the train data to learn the causal graph, using FGES. Thank you very much. — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRZTHBCM6U6M5BKFRNTYBEH3JAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGI4DAMRUGE . You are receiving this because you commented.Message ID: @.>

Thank you very much for your helping. I will also try some methods based on your ideas.

jdramsey commented 1 year ago

Great! :-)

On Wed, Oct 25, 2023 at 10:13 AM zhangziyuan @.***> wrote:

Oh, cool, a rover dataset! A long time ago, I worked on some data for an earlier rover. This is going to require some thought, though, so give me a few days. I'll get to it. The basic problem with singularities is just that--some variables are linear combinations (exactly) of other variables. If you want to use a score that inverts correlation matrices for submatrices, you need to remove some variables from the data since that will throw singularity exceptions. But until I look at the data, I can't give you much useful advice. You could treat the data all as discrete, which should, in principle, work if the sample size is big enough. That is, add a discretization step. I've thought of doing a Linear Gaussian BIC score that uses generalized matrix inversion but don't think it will be available in the near future. On Wed, Oct 25, 2023 at 9:29 AM zhangziyuan @.

*> wrote: … <#m1597049013796612852> Could you describe your data? … <#m3357814697248003521> On Wed, Oct 25, 2023 at 3:53 AM zhangziyuan @.> wrote: When I use FGES algorithm, the output has lots of "Singularity encountered when scoring X|X". I wonder if this is normal? If it is abnormal, how can I solve this problems? Thank you very much. — Reply to this email directly, view it on GitHub <#12 https://github.com/cmu-phil/py-tetrad/issues/12 <#12 https://github.com/cmu-phil/py-tetrad/issues/12>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA https://github.com/notifications/unsubscribe-auth/ACLFSR7W3GLLFOUB7GHSL4TYBDAQTAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANZXHA3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.> The data is an open source multivariate time series dataset called SMAP. Each time serie contains 55 features. More details about this dataset can be found https://www.kaggle.com/datasets/patrickfleith/nasa-anomaly-detection-dataset-smap-msl https://www.kaggle.com/datasets/patrickfleith/nasa-anomaly-detection-dataset-smap-msl . I use all the train data to learn the causal graph, using FGES. Thank you very much. — Reply to this email directly, view it on GitHub <#12 (comment) https://github.com/cmu-phil/py-tetrad/issues/12#issuecomment-1779280241>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSRZTHBCM6U6M5BKFRNTYBEH3JAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGI4DAMRUGE https://github.com/notifications/unsubscribe-auth/ACLFSRZTHBCM6U6M5BKFRNTYBEH3JAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGI4DAMRUGE . You are receiving this because you commented.Message ID: @.***>

Thank you very much for your helping. I will also try some methods based on your ideas.

— Reply to this email directly, view it on GitHub https://github.com/cmu-phil/py-tetrad/issues/12#issuecomment-1779377091, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLFSR4OH6AJSZWNG2SKJRDYBENARAVCNFSM6AAAAAA6O5HRFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGM3TOMBZGE . You are receiving this because you commented.Message ID: @.***>

jdramsey commented 10 months ago

Hi, is this still a live issue? Can I close it?

Joanna-0421 commented 10 months ago

Hi, is this still a live issue? Can I close it?

yes! I don't have problems, thank you very much.

jdramsey commented 10 months ago

Great, thanks! :-)