Boundary segment with 3 features

CristianMorasso commented 2 months ago

Hello,

My name is Cristian and I'm using your tool for my project.

What I would like to do is the extraction of the boundary learned during the train of an NN.

Considering a dataset of 2 features, the tool performs very well, but when I try to increase the number of features, the output appears strange to me (even if the NN performs very well). This is an example :

immagine Desc: 3 features case; to plot the data I just apply the T matrix to transform 3 features to 2.

I've understood that the problem could be related to the T matrix, and hence to the domain.

I tried the three different functions to compute the domain (square), but I couldn't find a way to make it work.

Can you please tell me if it is possible to compute the boundary considering 3 features? Additionaly, can you tell me how can I compute the domain in the right way for the problem under analysis?

Many thanks in advance.

AhmedImtiazPrio commented 2 months ago

Hi Christian,

Thanks for reaching out. Indeed I think this is an issue with the projection you are taking. Could you elaborate what domain you are using? And to confirm you are using get_proj_mat(domain) to get T? If you want a clean decision boundary between the two clusters, you'd need to consider 2D domain along which you expect to see something close to a linear separation.

On Thu, May 9, 2024 at 5:51 AM Cristian Morasso @.***> wrote:

Hello,

My name is Cristian and I'm using your tool for my project.

What I would like to do is the extraction of the boundary learned during the train of an NN.

Considering a dataset of 2 features, the tool performs very well, but when I try to increase the number of features, the output appears strange to me (even if the NN performs very well). This is an example :

immagine.png (view on web) https://github.com/AhmedImtiazPrio/splinecam/assets/61107879/5cc7c185-1e6c-4869-9260-0e7e7a49d5a7 Desc: 3 features case; to plot the data I just apply the T matrix to transform 3 features to 2.

I've understood that the problem could be related to the T matrix, and hence to the domain.

I tried the three different functions to compute the domain (square), but I couldn't find a way to make it work.

Can you please tell me if it is possible to compute the boundary considering 3 features? Additionaly, can you tell me how can I compute the domain in the right way for the problem under analysis?

Many thanks in advance.

— Reply to this email directly, view it on GitHub https://github.com/AhmedImtiazPrio/splinecam/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2F56LF2RW7OYZSQPOCNCTZBNIJVAVCNFSM6AAAAABHOTELISVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONBSHA4TEMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

CristianMorasso commented 2 months ago

Thanks for your attention. Im using this function (get_square_slice_from_one_anchor(anchor)) to compute the domain, and passing the mean vector as anchor, then the get_proj_mat(domain) to compute the T matrix.

What you are suggesting is to use a T matrix, that performs a projection able to separate linearly the classes? So if the data is a complex (with >3 important features), it may fail, right?

AhmedImtiazPrio commented 2 months ago

Yes! If the data is complex we might not see a smooth boundary between the distributions. However, I would suggest trying with three random samples from the two distributions with the get_square_slice_from_centroid function. Might look better. If that doesn't look satisfactory, maybe try taking the mean of the two distributions and one random sample from one of the distributions.

On Sat, May 11, 2024, 5:25 AM Cristian Morasso @.***> wrote:

Thanks for your attention. Im using this function (get_square_slice_from_one_anchor(anchor)) to compute the domain, and passing the mean vector as anchor, then the get_proj_mat(domain) to compute the T matrix.

What you are suggesting is to use a T matrix, that performs a projection able to separate linearly the classes? So if the data is a complex (with >3 important features), it may fail, right?

— Reply to this email directly, view it on GitHub https://github.com/AhmedImtiazPrio/splinecam/issues/12#issuecomment-2105666495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2F56LFV4YL4LTHSH7VFC3ZBXWYBAVCNFSM6AAAAABHOTELISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGY3DMNBZGU . You are receiving this because you commented.Message ID: @.***>

CristianMorasso commented 2 months ago

Many thanks for your suggestions. Unfortunately, I’ve already tried in that way using a synthetic dataset, but computing the mean could be a problem when I consider two classes with two clusters for each class.

Thanks again :)

AhmedImtiazPrio / splinecam

Boundary segment with 3 features #12