dirko / pyhcrf

A hidden conditional random field (HCRF) implementation in Python.
BSD 2-Clause "Simplified" License
27 stars 10 forks source link

Could you please advise how to add feature vectors for each word? #1

Open munichong opened 9 years ago

munichong commented 9 years ago

In my case, each word has a feature vector with length 200. Their weights also need to be learned.

dirko commented 9 years ago

At the moment arbitrary feature vectors are not possible. The model only accepts one-hot encoded sparse input, for example the line in train.dat:

2 1 2 5 

represents a training example where the label is 2, and the input feature vector is [1 1 0 0 1]. (The 1st, 2d, and 5th elements of the vector is 1 and the rest zero).

Do you want to use neural-embeddings as input features? Or do you want to train the embeddings from scratch? What sort of application do you have in mind - we might be able to work something out?

munichong commented 9 years ago

Yes. I plan to use Word2Vec to generate a vector for each word in a sequence and then use HCRF to classify each sequence. Most sequences contains about 2-4 words. There are 13 categories.

So far, I just tried use the one-hot encoding as what has been implemented. The learning process is very slow.

CONSOLE OUTPUT:

79832 training examples and 20168 test examples
RUNNING THE L-BFGS-B CODE
           * * *
Machine precision = 2.220D-16
N =      3005743     M =           10
At X0         0 variables are exactly at the bounds

After about one hour, it prints the result of the first iteration:

-203084.209946
At iterate    0    f=  2.03134D+05    |proj g|=  9.99996D-02

Actually, I was just using a small subset of the data I have. Do you have any idea?

dirko commented 9 years ago

The library is currently implemented in pure (and slow) python.

I think that just porting parts of it to cython/numba will increase the speed tremendously, but I don't have time to do it at the moment unfortunately because I'm working on a more general graphical model package. I might attempt a re-write in a week or two if I have time?

If you want to try to do it, however, I'm happy to help out here and there - especially if you have a pull-request or two you want me to look at. Adding support for arbitrary features shouldn't be that difficult either.

munichong commented 9 years ago

I really have interested to improve it, since there are few good HCRF packages out there. I am doing a project for my internship which will end in next few weeks. So I probably do not have enough time to improve the existing code. I will try it on a small set of the data and see how it works. If the result is promising, I will try numba afterwards (even my internship is over).

dirko commented 9 years ago

If there is interest then I'll put in some effort so we can have a nice HCRF package - didn't realise that there weren't many implementations.

I've started to refactor the code to be closer to the sklearn interface. See this branch. The steps that I think will be necessary from here:

I'm excited to get this working nicely - hope the above is what you have in mind as well

dirko commented 9 years ago

I re-wrote (See https://github.com/dirko/pyhcrf/pull/2) most of the model to conform to something closer to the sklearn interface. Support for dense and sparse input has also been added. Let me know whether it works for you.

Things I still need to do:

I suspect that the initialisation of the parameters will have a large effect on the model's performance, so I'll add a way to pass initialisations in.

YOUNGING commented 9 years ago

Could I use this library for image classification?

YOUNGING commented 9 years ago

I‘m a novice for HCRF, I read some paper,they mention the 'undirected graph',but I cannot find it in your code .That really confuse me.Could you give me some help please?Thanks a lot

dirko commented 9 years ago

Sure it could be used for that - think it might work for sequences of images. Then the input X to the fit method is a list of numpy arrays where each array is of shape (T_i, D), where T_i is how many images is in that example, and D is the dimensionality of each image (the number of pixels or features - so if you're using pixels this must be flattened).

In the original paper they describe two models, the one works with sequences and the other with trees. In this implementation I only implemented sequences (which is a simple chain graph - this graph structure is implicit and you won't find any reference to it in the code). I'm also interested in implementing trees but will try to do it in a more general graphical model package (I'm working on that here if you want to see how that is coming along).

Please let me know which parts I must elaborate more on! When I get time I'll also add an example of how to build an image sequence model.