caseykneale / ChemometricsTools.jl

A collection of tools for chemometrics and machine learning written in Julia.
Other
64 stars 12 forks source link

SNV() #12

Closed dorian-kiwi closed 5 years ago

dorian-kiwi commented 5 years ago

Just following your examples....

julia> FauxSpectra1 = randn(10,200);

julia> SNV = StandardNormalVariate(FauxSpectra1);

julia> Transformed1 = SNV(FauxSpectra1); ERROR: MethodError: objects of type Array{Float64,2} are not callable Use square brackets [] for indexing an Array. Stacktrace: [1] top-level scope at none:0

caseykneale commented 5 years ago

Ah Okay - there actually isn't a bug here except with the documentation!

So SNV is no longer a transformation like for example CenterScale() is. It's just a function, so you put data in and data comes out. I don't know why I ever considered it to be a transformation? Pretty sure the chemometrics definition is the row-wise operation rather then the columnwise(CenterScale).

Transformations store information about the data so they can be reversed in pipelines or just line by line. Preprocessing function are just typical functions.

To address this I will update the documentation for that example. I'm sorry for any confusion, it's pretty hard maintaining all aspects of a package this large by myself.

caseykneale commented 5 years ago

Please try the new walkthrough on : https://caseykneale.github.io/ChemometricsTools.jl/Demos/Transforms/

and let me know if you bump into any other inconsistencies.

dorian-kiwi commented 5 years ago

Thanks Casey

I worked out how it was meant to be done soon after posting the issue on GitHub. I must say, I am very impressed by your package. The tutorials are however a little sparse - do you recommend a text that would provide a little more detail? Some of the issues you raise, like Direct Standardizations, reference key papers I cannot obtain from any of the university libraries, and the freely available abstracts on those papers are a bit superficial.

I am doing some work on IR applied to milk samples - a little outside my normal research area on linear mixed models in genetics.

Regards Dorian

On 18/08/2019, at 11:46 PM, Casey Kneale notifications@github.com wrote:

Please try the new walkthrough on : https://caseykneale.github.io/ChemometricsTools.jl/Demos/Transforms/ https://caseykneale.github.io/ChemometricsTools.jl/Demos/Transforms/ and let me know if you bump into any other inconsistencies.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/caseykneale/ChemometricsTools.jl/issues/12?email_source=notifications&email_token=AJL6HXWEGLS2SJSMYP4GPQTQFEZALA5CNFSM4IMQOWL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4Q6G3Q#issuecomment-522314606, or mute the thread https://github.com/notifications/unsubscribe-auth/AJL6HXXY3UAU3OO2RB2BREDQFEZALANCNFSM4IMQOWLQ.

caseykneale commented 5 years ago

No problem at all,

The tutorials are very sparse, I've unfortunately spent most of the effort writing the code and using it rather then documenting everything.

There are some books about Chemometrics but unfortunately a lot of them do not cover the expanse of the field. An excellent introductory book would be: https://www.crcpress.com/Introduction-to-Multivariate-Statistical-Analysis-in-Chemometrics/Varmuza-Filzmoser/p/book/9781420059472 It's very approachable book.

I'm very sorry a lot of the journals are pay-walled for you. I did some work on milk powder samples in the past - it was fun. Yes the models used in genetics are a bit different, a major theme in spectroscopic signals is multicollinearity and coping with it.

dorian-kiwi commented 5 years ago

Thanks for the book suggestion. We are used to dealing with multicollinearity with genome data - eg using up to 20 million loci as features and perhaps 100,000 genotyped individuals - but we mostly use Bayesian methods that treat the features as random (and therefore shrink them) and often simultaneously fit mixture models - the simplest of which assumes a feature either has zero effect or an effect from a Normal or t-distribution - but this means we need to use MCMC….

Regards Dorian

On 19/08/2019, at 10:35 AM, Casey Kneale notifications@github.com wrote:

No problem at all,

The tutorials are very sparse, I've unfortunately spent most of the effort writing the code and using it rather then documenting everything.

There are some books about Chemometrics but unfortunately a lot of them do not cover the expanse of the field. An excellent introductory book would be: https://www.crcpress.com/Introduction-to-Multivariate-Statistical-Analysis-in-Chemometrics/Varmuza-Filzmoser/p/book/9781420059472 https://www.crcpress.com/Introduction-to-Multivariate-Statistical-Analysis-in-Chemometrics/Varmuza-Filzmoser/p/book/9781420059472 It's very approachable book.

I'm very sorry a lot of the journals are pay-walled for you. I did some work on milk powder samples in the past - it was fun. Yes the models used in genetics are a bit different, a major theme in spectroscopic signals is multicollinearity and coping with it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/caseykneale/ChemometricsTools.jl/issues/12?email_source=notifications&email_token=AJL6HXTUQGIGKLL7G2L7LPTQFHFEVA5CNFSM4IMQOWL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RJ6YY#issuecomment-522362723, or mute the thread https://github.com/notifications/unsubscribe-auth/AJL6HXQH2H3AKT7VQ5HKF3LQFHFEVANCNFSM4IMQOWLQ.

caseykneale commented 5 years ago

Interesting. Usually there isn't a need to do things like MCMC or Bayesian methods for IR signals. Good experimental design, preprocessing and modelling can be enough for basic tasks. However, there are always more nuances.

Let me know if there's anything you need clarified in the documentation or any features you're curious about. Don't be afraid to file issues the more users I can get to try this toolset out the better.