accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.48k stars 1.99k forks source link

Add an Example for HiddenMarkovModel(TDistribution, TObservation) Class #697

Closed isaactarume closed 7 years ago

isaactarume commented 7 years ago

Please add an example for HiddenMarkovModel(TDistribution, TObservation) Class.

Please l will like to see an example of a multivariatedistribution being modeled within the HMM. The univariate examples works perfect, however my dataset has more than 2 variables per data point hence l need a multivariate "Time" "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "Amount" "Class" 0 -1.3598071336738 -0.0727811733098497 2.53634673796914 1.37815522427443 -0.338320769942518 0.462387777762292 0.239598554061257 0.0986979012610507 0.363786969611213 0.0907941719789316 -0.551599533260813 -0.617800855762348 -0.991389847235408 -0.311169353699879 1.46817697209427 -0.470400525259478 0.207971241929242 0.0257905801985591 0.403992960255733 0.251412098239705 -0.018306777944153 0.277837575558899 -0.110473910188767 0.0669280749146731 0.128539358273528 -0.189114843888824 0.133558376740387 -0.0210530534538215 149.62 "0" 0 1.19185711131486 0.26615071205963 0.16648011335321 0.448154078460911 0.0600176492822243 -0.0823608088155687 -0.0788029833323113 0.0851016549148104 -0.255425128109186 -0.166974414004614 1.61272666105479 1.06523531137287 0.48909501589608 -0.143772296441519 0.635558093258208 0.463917041022171 -0.114804663102346 -0.183361270123994 -0.145783041325259 -0.0690831352230203 -0.225775248033138 -0.638671952771851 0.101288021253234 -0.339846475529127 0.167170404418143 0.125894532368176 -0.00898309914322813 0.0147241691924927 2.69 "0" 1 -1.35835406159823 -1.34016307473609 1.77320934263119 0.379779593034328 -0.503198133318193 1.80049938079263 0.791460956450422 0.247675786588991 -1.51465432260583 0.207642865216696 0.624501459424895 0.066083685268831 0.717292731410831 -0.165945922763554 2.34586494901581 -2.89008319444231 1.10996937869599 -0.121359313195888 -2.26185709530414 0.524979725224404 0.247998153469754 0.771679401917229 0.909412262347719 -0.689280956490685 -0.327641833735251 -0.139096571514147 -0.0553527940384261 -0.0597518405929204 378.66 "0" 1 -0.966271711572087 -0.185226008082898 1.79299333957872 -0.863291275036453 -0.0103088796030823 1.24720316752486 0.23760893977178 0.377435874652262 -1.38702406270197 -0.0549519224713749 -0.226487263835401 0.178228225877303 0.507756869957169 -0.28792374549456 -0.631418117709045 -1.0596472454325 -0.684092786345479 1.96577500349538 -1.2326219700892 -0.208037781160366 -0.108300452035545 0.00527359678253453 -0.190320518742841 -1.17557533186321 0.647376034602038 -0.221928844458407 0.0627228487293033 0.0614576285006353 123.5 "0" 2 -1.15823309349523 0.877736754848451 1.548717846511 0.403033933955121 -0.407193377311653 0.0959214624684256 0.592940745385545 -0.270532677192282 0.817739308235294 0.753074431976354 -0.822842877946363 0.53819555014995 1.3458515932154 -1.11966983471731 0.175121130008994 -0.451449182813529 -0.237033239362776 -0.0381947870352842 0.803486924960175 0.408542360392758 -0.00943069713232919 0.79827849458971 -0.137458079619063 0.141266983824769 -0.206009587619756 0.502292224181569 0.219422229513348 0.215153147499206 69.99 "0"

Help Topic: http://accord-framework.net/docs/html/T_Accord_Statistics_Models_Markov_HiddenMarkovModel_2.htm

cesarsouza commented 7 years ago

Hi Isaac,

The first step is to organize your data as a nested jagged array.

In your example, each observation can have 28 dimensions. You have three sequences in your dataset, where the first sequence has 2 observations, the second sequence has 2 observations, and the third sequence has 1 observation.

This means that you have to organize your data as a matrix like this:

double[][][] sequences = new double[3][][]; // because you have 3 sequences
sequences[0] = new double[2][][]; // because the first sequence has 2 observations
for (int i = 0; i < 2; i++)
    sequences[0][i] = new double[28]; // because each observation has 28 dimensions

sequences[1] = new double[2][][]; // because the second sequence has 2 observations
for (int i = 0; i < 2; i++)
    sequences[1][i] = new double[28]; // because each observation has 28 dimensions

sequences[2] = new double[1][][]; // because the third sequence has 1 observations
sequences[2][0] = new double[28]; // because each observation has 28 dimensions

Of course, note that you can write a method to create this matrix automatically from your dataset, without having to write a different code for each sequence. I just explained this way to help visualize what is necessary to be done.

Now, once you managed to populate this double[][][] array using your data, you can create a hidden Markov classifier as shown in the documentation:

// Labels for the sequences
int[] labels = { 0, 0, 0 }; // in the data you provided, all your three sequences had class 0; adjust as you need

// Initial emission density to be copied to each state
var initialDensity = new MultivariateNormalDistribution(2);

// Creates a sequence classifier containing 2 hidden Markov Models with 2 states
// and an underlying multivariate mixture of Normal distributions as density.
var classifier = new HiddenMarkovClassifier<MultivariateNormalDistribution, double[]>(
    classes: 2, topology: new Forward(2), initial: initialDensity);

// Configure the learning algorithms to train the sequence classifier
var teacher = new HiddenMarkovClassifierLearning<MultivariateNormalDistribution, double[]>(classifier)
{
    // Train each model until the log-likelihood changes less than 0.0001
    Learner = modelIndex => new BaumWelchLearning<MultivariateNormalDistribution, double[], NormalOptions>(classifier.Models[modelIndex])
    {
        Tolerance = 0.0001,
        Iterations = 0,

        FittingOptions = new NormalOptions()
        {
            Diagonal = true,      // only diagonal covariance matrices
            Regularization = 1e-5 // avoid non-positive definite errors
        }
    }
};

// Train the sequence classifier using the algorithm
teacher.Learn(sequences, labels);

// Measure the error of the model 
double logLikelihood = teacher.LogLikelihood;

// Use the model to predict your data
int[] prediction = classifier.Decide(sequences);

Hope it helps, Cesar

isaactarume commented 7 years ago

On Thu, Jul 13, 2017 at 12:00 AM, isaac tarume isaac.tarume@gmail.com wrote:

Thanks Cesar

Well, the jagged array gave me a run around since the actual observation file is very long like l said, more than 250 000 data points so l just left it check your implementation up there.

So l then decided to just use the 6 data points l send you before. I then run into a misamatch error (DimensionMismatchException("x", "The vector should have the same dimension as the distribution.") on line

teacher.Learn(sequences, labels);

I quickly fixed it by changing dimension length to the 30 since l have 30 dimension for each point. (Well could not figure out how to even drop Time, and Class attributes from the data)

var initialDensity = new MultivariateNormalDistribution(2); to 30

For some reason this worked but then l ran into another problem. When l calculate my logLikelihood its very small (-33.4385720557637780) which is not the main issue but l have a question on the way you pre processed the data.

  1. Why classify all the data points with the same Time variable as belonging to the same sequence. Could we not equally partition the data into say data point of sequence length 10, from the 1st row to the 10th row, then the second sequence from 2th row to the 11th, then the third sequence from the 3th row to the 12th row and so on. I thought the loglikelihood of a longer length is always gonna be smaller than that of a sequence with a shorter length,yes? :)

  2. Is this data best modeled as classification problem. I thought we could still model it as pure hidden Markov model (not hidden Markov model classifier).😉

I have attached the the first 25 000 rows from the data set, let me know if you still think your assumption above still hold. Also am struggling to create this data matrix automatically from the data, am not so clued up to pre data processing if you can assist there.

Kind Regards and Many Thanks.

On Mon, Jul 10, 2017 at 11:53 PM, César Souza notifications@github.com wrote:

Hi Isaac,

The first step is to organize your data as a nested jagged array.

In your example, each observation can have 28 dimensions. You have three sequences in your dataset, where the first sequence has 2 observations, the second sequence has 2 observations, and the third sequence has 1 observation.

This means that you have to organize your data as a matrix like this:

double[][][] sequences = new double[3][][]; // because you have 3 sequences sequences[0] = new double[2][][]; // because the first sequence has 2 observations for (int i = 0; i < 2; i++) data[0][i] = new double[28]; // because each observation has 28 dimensions

sequences[1] = new double[2][][]; // because the second sequence has 2 observations for (int i = 0; i < 2; i++) data[1][i] = new double[28]; // because each observation has 28 dimensions

sequences[2] = new double[2][][]; // because the third sequence has 1 observations data[2][0] = new double[28]; // because each observation has 28 dimensions

Of course, note that you can write a method to create this matrix automatically from your dataset, without having to write a different code for each sequence. I just explained this way to help visualize what is necessary to be done.

Now, once you managed to populate this double[][][] array using your data, you can create a hidden Markov classifier as shown in the documentation http://accord-framework.net/docs/html/T_Accord_Statistics_Models_Markov_Learning_HiddenMarkovClassifierLearning_2.htm :

// Labels for the sequences int[] labels = { 0, 0, 0 }; // in the data you provided, all your three sequences had class 0

// Initial emission density to be copied to each state var initialDensity = new MultivariateNormalDistribution(2);

// Creates a sequence classifier containing 2 hidden Markov Models with 2 states // and an underlying multivariate mixture of Normal distributions as density. var classifier = new HiddenMarkovClassifier<MultivariateNormalDistribution, double[]>( classes: 2, topology: new Forward(2), initial: initialDensity);

// Configure the learning algorithms to train the sequence classifier var teacher = new HiddenMarkovClassifierLearning<MultivariateNormalDistribution, double[]>(classifier) { // Train each model until the log-likelihood changes less than 0.0001 Learner = modelIndex => new BaumWelchLearning<MultivariateNormalDistribution, double[], NormalOptions>(classifier.Models[modelIndex]) { Tolerance = 0.0001, Iterations = 0,

    FittingOptions = new NormalOptions()
    {
        Diagonal = true,      // only diagonal covariance matrices
        Regularization = 1e-5 // avoid non-positive definite errors
    }
}

};

// Train the sequence classifier using the algorithm teacher.Learn(sequences, labels);

double logLikelihood = teacher.LogLikelihood;

// Calculate the probability that the given // sequences originated from the model double likelihood, likelihood2;

// Use the model to predict your data int[] prediction = classifier.Decide(sequences);

Hope it helps, Cesar

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/accord-net/framework/issues/697#issuecomment-314260730, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoIjcG3StRO-uSCQsmAZXFWsw-AFK2Jks5sMp1ngaJpZM4OTSSE .

cesarsouza commented 7 years ago

Hi Isaac,

Well, the jagged array gave me a run around since the actual observation file is very long like l said, more than 250 000 data points so l just left it check your implementation up there.

You do not have to initialize the jagged array in the way I did, you can initialize in any way you would like. From the initial definition you shared it didn't seem very hard to convert the 2D table into a 3D jagged array where the number of dimensions for the 2nd dimension has variable-length.

I quickly fixed it by changing dimension length to the 30 since l have 30 dimension for each point. (Well could not figure out how to even drop Time, and Class attributes from the data)

I am not sure you should keep the "class" variable in the training data, as by its name I guess it should have been given as the intended output of said model, not input. Please disconsider if this information is actually available to you at test time, in contrast to being available during only training time.

For some reason this worked but then l ran into another problem. When l calculate my logLikelihood its very small (-33.4385720557637780) which is not the main issue but l have a question on the way you pre processed the data.

This is expected. If you are trying to perform sequence classification, it doesn't really matter how low the probabilities of your sequence are, but only how they compare to the probabilities of other sequences. Plus, the probability of a single sequence in a HMM is guaranteed to get low as the length of your sequence increases.

  1. Why classify all the data points with the same Time variable as belonging to the same sequence. Could we not equally partition the data into say data point of sequence length 10, from the 1st row to the 10th row, then the second sequence from 2th row to the 11th, then the third sequence from the 3th row to the 12th row and so on. I thought the loglikelihood of a longer length is always gonna be smaller than that of a sequence with a shorter length,yes? :)

I wasn't sure how your data was organized, so I assumed that all sequences with the same "time" variable corresponded to the same sequence. If that is not how your data is organized, just make sure that your jagged 3D array is organized as [sequence][observation][dimensions].

  1. Is this data best modeled as classification problem. I thought we could still model it as pure hidden Markov model (not hidden Markov model classifier).

Probably it could, but then I am sorry because I haven't fully understood what your database is about. Could you please describe a bit more about what you are trying to achieve? For example, do you think you could specify what are your inputs, what each sequence represents, and do you have a class label associated with each sequence, or with each observation in your dataset?

I have attached the the first 25 000 rows from the data set, let me know if you still think your assumption above still hold. Also am struggling to create this data matrix automatically from the data, am not so clued up to pre data processing if you can assist there.

I am afraid you have attached the file in the e-mail message but not in the GitHub issue. Please, if you can, can you attach it to the GitHub issue at https://github.com/accord-net/framework/issues/697? Otherwise I won't be able to download it.

Again, thanks a lot for opening the issue and sharing so much information about your problem!

Regards, Cesar

isaactarume commented 7 years ago

DataForHMM2.txt

Here is the file

cesarsouza commented 7 years ago

Hi Isaac,

Many thanks!

Could you please let me know what is the meaning of the Time, Amount and Class in your data? What would be the final goal of your system after it has been learned? Which columns is it supposed to be able to predict?

Thanks! Cesar

isaactarume commented 7 years ago

Yaa as l alluded to earlier its credit card transactions so l guess the time is when the transaction happened in relation to the next transaction Amount is the purchase amount Class is whether its fraudulent or not.

Variables 1 _28 am not sure what exactly they are , they came scrubbed already maybe PCA or something like that l got it as is.

Sorry should have made this more clearer earlier.

Hope this helps and many thanks man for sharing so much and help ing

Kind Regards

On 13 Jul 2017 1:07 a.m., "César Souza" notifications@github.com wrote:

Many thanks!

Could you please let me know what is the meaning of the Time, Amount and Class in your data? What would be the final goal of your system after it has been learned on such data?

Thanks! Cesar

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/accord-net/framework/issues/697#issuecomment-314923010, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoIjbbCt8GP0bT43zbkQ4EpjNZDW-lPks5sNVGqgaJpZM4OTSSE .

cesarsouza commented 7 years ago

Hi @isaactarume,

Here is an example demonstrating how to load your data into a jagged matrix and learning a HMM model for fraud detection from them:

public void sequence_parsing_test()
{
    Accord.Math.Random.Generator.Seed = 0;

    // Let's say we have the following data about credit card transactions,
    // where the data is organized in order of transaction, per credit card 
    // holder. Everytime the "Time" column starts at zero, denotes that the
    // sequence of observations follow will correspond to transactions of the
    // same person:

    double[,] data =
    {
        // "Time", "V1",   "V2",  "V3", "V4", "V5", "Amount",  "Fraud"
        {      0,   0.521, 0.124, 0.622, 15.2, 25.6,   2.70,      0 }, // first person, ok
        {      1,   0.121, 0.124, 0.822, 12.2, 25.6,   42.0,      0 }, // first person, ok

        {      0,   0.551, 0.124, 0.422, 17.5, 25.6,   20.0,      0 }, // second person, ok
        {      1,   0.136, 0.154, 0.322, 15.3, 25.6,   50.0,      0 }, // second person, ok
        {      2,   0.721, 0.240, 0.422, 12.2, 25.6,   100.0,     1 }, // second person, fraud!
        {      3,   0.222, 0.126, 0.722, 18.1, 25.8,   10.0,      0 }, // second person, ok
    };

    // Transform the above data into a jagged matrix
    double[][][] input;
    int[][] states;
    transform(data, out input, out states);

    // Determine here the number of dimensions in the observations (in this case, 6)
    int observationDimensions = 6; // 6 columns: "V1", "V2", "V3", "V4", "V5", "Amount"

    // Create some prior distributions to help initialize our parameters
    var priorC = new WishartDistribution(dimension: observationDimensions, degreesOfFreedom: 10); // this 10 is just some random number, you might have to tune as if it was a hyperparameter
    var priorM = new MultivariateNormalDistribution(dimension: observationDimensions);

    // Configure the learning algorithms to train the sequence classifier
    var teacher = new MaximumLikelihoodLearning<MultivariateNormalDistribution, double[]>()
    {
        // Their emissions will be multivariate Normal distributions initialized using the prior distributions
        Emissions = (j) => new MultivariateNormalDistribution(mean: priorM.Generate(), covariance: priorC.Generate()),

        // We will prevent our covariance matrices from becoming degenerate by adding a small 
        // regularization value to their diagonal until they become positive-definite again:
        FittingOptions = new NormalOptions()
        {
            Regularization = 1e-6
        },
    };

    // Use the teacher to learn a new HMM 
    var hmm = teacher.Learn(input, states);

    // Use the HMM to predict whether the transations were fradulent or not:
    int[] firstPerson = hmm.Decide(input[0]); // predict the first person, output should be: 0, 0

    int[] secondPerson = hmm.Decide(input[1]); // predict the second person, output should be: 0, 0, 1, 0
}

// This is the method that can be used to transform your data into a jagged array:
private static void transform(double[,] data, out double[][][] input, out int[][] states)
{
    var sequences = new List<double[][]>();
    var classLabels = new List<int[]>();

    List<double[]> currentSequence = null;
    List<int> currentLabels = null;
    for (int i = 0; i < data.Rows(); i++)
    {
        // Check if the first column contains a zero, this would be an indication
        // that a new sequence (for a different person) is beginning:
        if (data[i, 0] == 0)
        {
            // Yes, this is a new sequence. Check if we were building
            // a sequence before, and if yes, save it to the list:
            if (currentSequence != null)
            {
                // Save the sequence we had so far 
                sequences.Add(currentSequence.ToArray());
                classLabels.Add(currentLabels.ToArray());

                currentSequence = null;
                currentLabels = null;
            }

            // We will be starting a new sequence
            currentSequence = new List<double[]>();
            currentLabels = new List<int>();
        }

        double[] features = data.GetRow(i).Get(1, 7); // Get values in columns from 1 (inclusive) to 7 (exclusive), meaning "V1", "V2", "V3", "V4", "V5", and "Amount"
        int classLabel = (int)data[i, 7]; // The seventh index corresponds to the class label column ("Class")

        // Save this information:
        currentSequence.Add(features);
        currentLabels.Add(classLabel);
    }

    // Check if there are any sequences and labels that we haven't saved yet:
    if (currentSequence != null)
    {
        // Yes there are: save them
        sequences.Add(currentSequence.ToArray());
        classLabels.Add(currentLabels.ToArray());
    }

    input = sequences.ToArray();
    states = classLabels.ToArray();
}

However, if you try to execute it right now you might run into an issue regarding the initialization of the model. The fix is quite simple, but it might take a while until I can generate a new release or pre-release version that fixes the issue. What are your time-frame constraints right now? Do you have any deadlines for having this method available?

Regards, Cesar

cesarsouza commented 7 years ago

Ah yes: also, I've used a simplified version of your data that you can adjust to your needs. I was not sure if your data could be made public, so I just faked some values that looked like it.

isaactarume commented 7 years ago

Thanks Cesar,

So am getting an error [image: Inline image 1]

System.InvalidCastException occurred HResult=0x80004002 Message=Unable to cast object of type 'System.Func2[System.Int32,Accord.Statistics.Distributions.Multivariate.MultivariateNormalDistribution]' to type 'Accord.Statistics.Distributions.Multivariate.MultivariateNormalDistribution'. Source=Accord.Statistics StackTrace: at Accord.Statistics.Models.Markov.HiddenMarkovModel2.<>c__DisplayClass28_0.<.ctor>b__0(Int32 i) at Accord.Statistics.Models.Markov.HiddenMarkovModel2..ctor(ITopology topology, Func2 emissions) at Accord.Statistics.Models.Markov.HiddenMarkovModel2..ctor(Int32 states, Func2 emissions) at Accord.Statistics.Models.Markov.Learning.MaximumLikelihoodLearning2.Create(TObservation[][] x, Int32 numberOfClasses) at Accord.Statistics.Models.Markov.Learning.BaseMaximumLikelihoodLearning4.Learn(TObservation[][] x, Int32[][] y, Double[] weights) at ConsoleApp2017.Program.Main(String[] args) in C:\Users\isaac.tarume\source\repos\ConsoleApp2017\ConsoleApp2017\Program.cs:line 81

I guess will need a new implementation, My deadline is until end July so have 2 weeks if the new release come out before that, will be brilliant. I see you have made a number of changes to the original methods and classes for this to work l guess.

Thanks a lot for your passion on this and sharing.

Kind Regards

On Sat, Jul 15, 2017 at 11:18 PM, César Souza notifications@github.com wrote:

Hi @isaactarume https://github.com/isaactarume,

Here is an example demonstrating how to load your data into a jagged matrix and learning a HMM model for fraud detection from them:

public void sequence_parsing_test() { Accord.Math.Random.Generator.Seed = 0;

// Let's say we have the following data about credit card transactions,
// where the data is organized in order of transaction, per credit card
// holder. Everytime the "Time" column starts at zero, denotes that the
// sequence of observations follow will correspond to transactions of the
// same person:

double[,] data =
{
    // "Time", "V1",   "V2",  "V3", "V4", "V5", "Amount",  "Fraud"
    {      0,   0.521, 0.124, 0.622, 15.2, 25.6,   2.70,      0 }, // first person, ok
    {      1,   0.121, 0.124, 0.822, 12.2, 25.6,   42.0,      0 }, // first person, ok

    {      0,   0.551, 0.124, 0.422, 17.5, 25.6,   20.0,      0 }, // second person, ok
    {      1,   0.136, 0.154, 0.322, 15.3, 25.6,   50.0,      0 }, // second person, ok
    {      2,   0.721, 0.240, 0.422, 12.2, 25.6,   100.0,     1 }, // second person, fraud!
    {      3,   0.222, 0.126, 0.722, 18.1, 25.8,   10.0,      0 }, // second person, ok
};

// Transform the above data into a jagged matrix
double[][][] input;
int[][] states;
transform(data, out input, out states);

// Determine here the number of dimensions in the observations (in this case, 6)
int observationDimensions = 6; // 6 columns: "V1", "V2", "V3", "V4", "V5", "Amount"

// Create some prior distributions to help initialize our parameters
var priorC = new WishartDistribution(dimension: observationDimensions, degreesOfFreedom: 10); // this 10 is just some random number, you might have to tune as if it was a hyperparameter
var priorM = new MultivariateNormalDistribution(dimension: observationDimensions);

// Configure the learning algorithms to train the sequence classifier
var teacher = new MaximumLikelihoodLearning<MultivariateNormalDistribution, double[]>()
{
    // Their emissions will be multivariate Normal distributions initialized using the prior distributions
    Emissions = (j) => new MultivariateNormalDistribution(mean: priorM.Generate(), covariance: priorC.Generate()),

    // We will prevent our covariance matrices from becoming degenerate by adding a small
    // regularization value to their diagonal until they become positive-definite again:
    FittingOptions = new NormalOptions()
    {
        Regularization = 1e-6
    },
};

// Use the teacher to learn a new HMM
var hmm = teacher.Learn(input, states);

// Use the HMM to predict whether the transations were fradulent or not:
int[] firstPerson = hmm.Decide(input[0]); // predict the first person, output should be: 0, 0

int[] secondPerson = hmm.Decide(input[1]); // predict the second person, output should be: 0, 0, 1, 0

} // This is the method that can be used to transform your data into a jagged array:private static void transform(double[,] data, out double[][][] input, out int[][] states) { var sequences = new List<double[][]>(); var classLabels = new List<int[]>();

List<double[]> currentSequence = null;
List<int> currentLabels = null;
for (int i = 0; i < data.Rows(); i++)
{
    // Check if the first column contains a zero, this would be an indication
    // that a new sequence (for a different person) is beginning:
    if (data[i, 0] == 0)
    {
        // Yes, this is a new sequence. Check if we were building
        // a sequence before, and if yes, save it to the list:
        if (currentSequence != null)
        {
            // Save the sequence we had so far
            sequences.Add(currentSequence.ToArray());
            classLabels.Add(currentLabels.ToArray());

            currentSequence = null;
            currentLabels = null;
        }

        // We will be starting a new sequence
        currentSequence = new List<double[]>();
        currentLabels = new List<int>();
    }

    double[] features = data.GetRow(i).Get(1, 7); // Get values in columns from 1 (inclusive) to 7 (exclusive), meaning "V1", "V2", "V3", "V4", "V5", and "Amount"
    int classLabel = (int)data[i, 7]; // The seventh index corresponds to the class label column ("Class")

    // Save this information:
    currentSequence.Add(features);
    currentLabels.Add(classLabel);
}

// Check if there are any sequences and labels that we haven't saved yet:
if (currentSequence != null)
{
    // Yes there are: save them
    sequences.Add(currentSequence.ToArray());
    classLabels.Add(currentLabels.ToArray());
}

input = sequences.ToArray();
states = classLabels.ToArray();

}

However, if you try to execute it right now you might run into an issue regarding the initialization of the model. The fix is quite simple, but it might take a while until I can generate a new release or pre-release version that fixes the issue. What are your time-frame constraints right now? Do you have any deadlines for having this method available?

Regards, Cesar

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/accord-net/framework/issues/697#issuecomment-315563713, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoIjdxIKiffSiXOJbLCeVM6JwGswzw_ks5sOSy8gaJpZM4OTSSE .

cesarsouza commented 7 years ago

Hi @isaactarume,

I've just uploaded new pre-release packages to NuGet. You might be able to get them by ticking the "Include pre-release" checkbox when searching for the Accord.NET packages in the NuGet browser in Visual Studio.

Hopefully those new packages should make it possible to run the code I've shared above.

Regards, Cesars

isaactarume commented 7 years ago

Super cool. This new version works like magic. Thanks Cesar

Look am not trying to make you work on my thesis, but there is just one more problem.

Using this methodology gives a lot of mis classifications, that is, for the predicting the learned data it works 100% but as soon as you give it new data which it never learns its always giving zero even the classes sometimes its suppose to be 1.So if l learn with 99 data point and add extra 1 data point to make it 100 for the purpose of testing the 100th point is always misclassified. I guess this has something to do with the way we interpreted the data(e.g. // Let's say we have the following data about credit card transactions, // where the data is organized in order of transaction, per credit card // holder. Everytime the "Time" column starts at zero, denotes that the // sequence of observations follow will correspond to transactions of the // same person:.)

Maybe we need a domain expert for this data so we can make good assumptions on our sequences etc. Otherwise thanks so much, I really think l will find a way to make this work and predict with accuracy if l get a data set which known and has a domain expert.

On Sun, Jul 16, 2017 at 11:19 PM, César Souza notifications@github.com wrote:

Hi @isaactarume https://github.com/isaactarume,

I've just uploaded new pre-release packages to NuGet. You might be able to get them by ticking the "Include pre-release" checkbox when searching for the Accord.NET packages in the NuGet browser in Visual Studio.

Hopefully those new packages should make it possible to run the code I've shared above.

Regards, Cesars

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/accord-net/framework/issues/697#issuecomment-315638090, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoIjezNQb568ot_Fuihd7BT1jETUcEtks5sOn5kgaJpZM4OTSSE .

isaactarume commented 7 years ago

Also as an after thought is an HMM the best for this kind of classification problem to get best results

Or one can think of hcrf or even nueral nets. Does the Accord offer these algorithms?

Kind Regards

On Mon, Jul 17, 2017 at 11:28 PM, isaac tarume isaac.tarume@gmail.com wrote:

Super cool. This new version works like magic. Thanks Cesar

Look am not trying to make you work on my thesis, but there is just one more problem.

Using this methodology gives a lot of mis classifications, that is, for the predicting the learned data it works 100% but as soon as you give it new data which it never learns its always giving zero even the classes sometimes its suppose to be 1.So if l learn with 99 data point and add extra 1 data point to make it 100 for the purpose of testing the 100th point is always misclassified. I guess this has something to do with the way we interpreted the data(e.g. // Let's say we have the following data about credit card transactions, // where the data is organized in order of transaction, per credit card // holder. Everytime the "Time" column starts at zero, denotes that the // sequence of observations follow will correspond to transactions of the // same person:.)

Maybe we need a domain expert for this data so we can make good assumptions on our sequences etc. Otherwise thanks so much, I really think l will find a way to make this work and predict with accuracy if l get a data set which known and has a domain expert.

On Sun, Jul 16, 2017 at 11:19 PM, César Souza notifications@github.com wrote:

Hi @isaactarume https://github.com/isaactarume,

I've just uploaded new pre-release packages to NuGet. You might be able to get them by ticking the "Include pre-release" checkbox when searching for the Accord.NET packages in the NuGet browser in Visual Studio.

Hopefully those new packages should make it possible to run the code I've shared above.

Regards, Cesars

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/accord-net/framework/issues/697#issuecomment-315638090, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoIjezNQb568ot_Fuihd7BT1jETUcEtks5sOn5kgaJpZM4OTSSE .

cesarsouza commented 7 years ago

Hi @isaactarume I am sorry I hadn't seen your update! Yes, Accord.NET offers both of those, but you can also try to take a look on SVMs trained with the Dynamic Time Warping kernel for sequence classification.

cesarsouza commented 7 years ago

The original issue was fixed in 3.7.0 (please do not hesitate to open a new issue if you would like to discuss about the DTW-SVMs!)

Frank1481906280 commented 5 years ago

How to save a model?I find that API is obsolete.

isaactarume commented 5 years ago

Hi Cesar Thanks for your powerful updates on this, Am not sure if you have seen the Hierarchical HMM, do you have any suggestion of how to extend your current classes to implement a HHMM model. Papers like S. Fine, Y. Singer and N. Tishby, "The Hierarchical Hidden Markov Model: Analysis and Applications", Machine Learning, vol. 32, p. 41–62, 1998 have the mathematical HHMM derivations.

Your suggestion will be highly appreciated