Closed DianaPat closed 1 year ago
Hi @juliohm I already changed to \cdot notation, checked that the diag function really gives a unitary vector and got rid of some outdated notation. But when I tried to change the plots in the visual test, they didn't get plotted all too well: One of the possible changes would be to modify the gr settings to change the size of the image, but even so the graphics at the diagonal keep coming out as lines (just like in the images above), so for now I just did the test with each corner separately and will continue checking the problem of the diagonals.
I already changed to \cdot notation, checked that the diag function really gives a unitary vector and got rid of some outdated notation.
That is great @DianaPat , now we can remove the normalization of alpha inside the rscore function, and only normalize it in the neadermead step. I will propose the changes so that you can review and accept.
But when I tried to change the plots in the visual test, they didn't get plotted all too well:
Let's just plot the result of the corner plot after the transformation in one figure and the result of reverting the transformation in a separate figure. They don't need to be in the same figure. Also, you can pass the option size=(800,800)
to the specific plot commands to overwrite the default size set in the gr
command.
Hi @DianaPat can you please double check that this GitHub option is enabled on this PR?
I fixed the problem locally and would like to push the changes to the PR.
It is enabled
Hi @DianaPat it is saying that permission is denied. Can you incorporate the fixes in your local copy and regenerate the figures?
aspectratio=:equal
from the runtests.jl file and place it in every existing plot command that existed before in the test files. Basically, scale.jl, zscore.jl, eigenanalysis.jl, if I recall correctly.aspectratio
option was the issue.After you make the changes you will see that the projection pursuit result is not looking quite right yet. The final distribution is not spherical visually indicating that either the number of iterations wasn't enough or that some other bug is still hidden somewhere.
Thank you @DianaPat. As you can see the first case is not working as expected given that one of the bivariate distributions doesn't look spherical. Can you please take a closer look at what may be happening? After the apply
is fixed we we need to also plot the revert
.
Yes, I'm already looking for this bug and other ones I've found. I'll update as soon as I find something!
Hi @juliohm, I think the error is fixed.
In fact the function was not comparing the data to a gaussian up to perc
percentile, but with 1-perc
. There was also an issue with just taking a higher percentile because the optim function raised an error. That part was just fixed with the float at line 93.
But then the tests stopped passing (at this point I'm wondering how did they pass before), so I relaxed the conditions.
At this point I think maybe it would be better to change all occurrences of perc
to 1-perc
in the code so we can keep the previous value, as it is more intuitive.
Hi @DianaPat thank you for checking it. Can you please move the float
call to within the definition of the basis
function? This function should return a floating point vector always to avoid type instability.
Regarding the percentile option, you mean that it is more intuitive to use the value .99? It would be great if you could make the appropriate changes. After that, it would be interesting to print the number of structure removal iterations that are happening in both test cases. It seems that the structure is not quite Gaussian still? Visually, the resulting point cloud has some borders that differ from those of a multivariate Gaussian.
I already made some experiments. Without any modification of parameters it runs the structure removal cycle 23 times. By changing the parameters a little, it reached 101 iterations; the problem is that it took aprox. 300 seg, so the tests would take more time.
I saved the figures to do a comparison of improvement:
I am finding it strange that the the points near the "tails" of the resulting distribution are somewhat spread. If you plot the ideal standard Gaussian you will be able to see the difference. Do you know how to use the VSCode debugger? You could add a plot command inside the structure removal loop and then add a breakpoint to see the plot at each iteration. Another thing you can do is check the optimal alphas as we did in class to see if everything is working as expected. Finally, a good diagnostic would be to plot the q-q plot of the transformed columns against the standard Gaussian to see if they pass through the identity line.
You are right!
The tails diverge from a gaussian outside the interval [-3,3]. I double checked the removal structure function but it's working as expected.
I went back to the bibliography (Friedman pg 255) and found this:
When I tried increasing the deg
argument, there was a slight improvement, but it came at the cost of time.
That is a very good catch. Can you comment on the time increase? Is it 2x slower? How the plots look like with higher degrees? Can we find a good compromise by default?
Em dom., 16 de out. de 2022 19:46, DianaPat @.***> escreveu:
You are right! The tails diverge from a gaussian outside the interval [-3,3]. I double checked the removal structure function but it's working as expected. I went back to the bibliography (Friedman pg 255) and found this: [image: Tails_under] https://user-images.githubusercontent.com/105749646/196061909-119106f3-f4c8-461f-a664-4cc8d7baff4b.PNG When I tried increasing the deg argument, there was a slight improvement, but it came at the cost of time.
— Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/pull/141#issuecomment-1280074599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MAVSGX4ULX2O5SD7TWDSAUBANCNFSM6AAAAAAQZXLHKQ . You are receiving this because you were mentioned.Message ID: @.***>
The time increase is ~3.5x:
With deg=5
it took around 16 sec to obtain the answer (image on the left), and with deg=30
it took 352 sec (image on the right)
I think the improvement is not enough to offset the cost. An idea could be to do another process post-transformation to normalize the tails.
Thank you @DianaPat for these careful checks. I agree with you that the time cost does not pay off, and we just need to be aware of this behavior near the tails.
I will try to review the PR one more time during the day. It would be important to plot the transformed and reversed data for both data sets if that is not already done.
That is really awesome @DianaPat. I think we finally made it! 🎉 Congrats!
One additional change we could implement to avoid depending on the heavy Optim.jl package is replace it with the NelderMead.jl package, which only implements the algorithm are currently using: https://github.com/jwscook/NelderMead.jl
You can learn more about the algorithm in this lecture: https://youtu.be/vOYlVvT3W80 It is beautiful and simple idea.
In theory, we would just need to replace the Optim.optimize
function by the NelderMead.optimise
function and rerun the tests. If you feel that you have some time to try this out, please go ahead. Otherwise, I invite you to review the original PR you submitted to compare how much you improved your coding skills in the process 👏🏽
These picky reviews pay off in the long run, and I am really happy you went all the way to the end! 🥇
Thank you, I learned a lot from this experience!
On the other hand, I tried to implement the change of package, but it seems that the autor have not registered the package, so I couldn't do it. For now will leave it as is while I look for other alternatives.
I've opened an issue requesting registration: https://github.com/jwscook/NelderMead.jl/issues/5
@DianaPat NelderMead.jl is now registered 👌🏽 would you like to give it a try here to see if the results are the same?
Codecov Report
100.00% <ø> (ø)
96.80% <96.80%> (ø)
95.00% <0.00%> (-0.07%)
100.00% <0.00%> (+5.55%)
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.