Open rdstern opened 7 years ago
I spoke to Alex earlier today. He's not in the SSD tomorrow so we plan to have a Skype where we can discuss all this. Then on Monday we plan to work together on it
Alex and I looked over PCA's earlier today and have drawn a few conclusions and I can sort a few of them out now.
Since there could be 400 columns in this dataset, I feel a ctrl+a option to select everything in the ucrSelector could be handy? Although I'd like to know others thoughts on this as I'm not sure how easy it would be!
Remove "scores" as a sub dialog option because this is for factor analysis not principal components analysis. Remove "residuals" from the sdg because this shouldn't be an option with PCA Save eigenvectors to at least a certain length (150?) as these may be used in further analysis. However, generally eigenvalues/etc would not be. There's a bug when plotting dimensions x and y always plots dimensions 1 and 2 at the moment We could possibly have a limit for the eigenvalues to print? If the user did not want around 400 "comp"'s to be printed: PCA(. . .)$eig[10 , ]
Thank you for starting work on this. I reply quickly on a trivial one of these.
If you do
Did you discuss the method of doing the PCA to make sure we use one that works with more columns than rows in the data?
Great, okay thanks. Yes, it does limit it but it still runs it I believe. It just limits it dimension/components wise to the minimum out of the no. of rows and columns. Alex says that that's fine, however!
I am going to send this also as an e-mail so it can go also to Mike and Patrick and also Alex Riba (from SSD).
Steve Kogo did the importing of SSTs some time ago. There remains a small bug there - that the import dialogue comes up immediately the first time you use that dialogue. It is different on subsequent uses of the dialogue.
But more generally I am not clear how to use the dialogue to import the data, so can't test it yet. Steve, could you please give an example, and perhaps Mike or Patrick could write a first go at some documentation. I assume it imports 2 data frames?
I also assume these data frames are not yet linked in any way? David and Danny have been working on links and key columns so this presumably needs to be added to the importing.
Once these data are in R-Instat it would be good to use them to try out the principal components dialogue. There are (at least) 4 aspects here and perhaps Alex could work with Lily on this, keeping Steve Kogo "in the loop". Lily and Steve could share the R-Instat tasks perhaps taking their other work into account.
1) I would like to understand the routine better to fit the PCs. I think Alex (in R) is using a routine that can cope with more x-variables (columns of ssts) than their length. Perhaps that could be used as our default routine in the PCAs analysis. (I don't mind whether there is a single method we use in R-Instat, or there is an option here.
2) Alex has also been plotting the SST PCA results in heat maps relating to the geographical positions of the SSTs. In R-Instat that would use the 2 (presumably linked) data frames. I hope it would (at least eventually) be possible to do this an a graphics option in R-Instat.
3) The individual results from the PCA still don't get saved. I have written on this in previous messages. I hope that sub-command could be added - not so urgent perhaps, but it has been pending since March,
4) Alex has also been relating these PCAs to simple indicators - and then doi8n g the PCA analysis following subtracting those components. It would be good for him to share that information with Lily - and hopefully David, to see whether that has an obvious place in R-Instat?
Then I have written another issue on the possible writing out of the results of the PCA analysis (or the rainfall data) to CPT - for multiple regression in CPT. This would be the idea that we could do the PCA analysis in R-Instat - possibly for different parts of the oceans separately. Then assess the value of the results (again in R-Instat) using regression analysis. Then we export the data for both the x and the y variables, ready to fit the regression again in CPT (but perhaps using combined results from several parts of the globe). Then we look at the output in CPT.