JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

Average Perceptron POS Tagger (Issue #2) #131

Closed ComputerMaestro closed 5 years ago

ComputerMaestro commented 5 years ago

I added two files "averagePerceptron.jl" and "tagger.jl" to the "src" folder in "TextAnalysis" , former file contains the Average Perceptron model and latter file is used to train a model, predict tags or reuse a model using model in former file. I have not made any changes to any of the original files so currently the model can't be used through "TextAnalysis.jl", because I in tagger I have to include an optional argument "load" which specifies whether user want to load pretrain weights or not. If the user want to load pretrained weights then the input should "true" (Boolean), which I don't know how should I take (Since I am little new to julia). I have commented the code for "loadModel" function in "tagger.jl" which should run if user gives that argument as true. But for now we can train and test the model. I will give the pretrained weights shortly. Please suggest me changes to make the better and a way to take that argument("load") from the user If I am able to take that argument the model will be completed.

aviks commented 5 years ago

Nice, thanks.

Let's iterate on this step by step.

first, please use BSON.jl to store the weights.

For how to load a model take a look at sentiment.jl. Basically, create a model object, and load the pre-trained weights on calling the default constructor of the model. Does that make sense?

ComputerMaestro commented 5 years ago

Thankyou, I was actually confused between BSON and JLD. I looked into , and I am thinking of using struct instead of "module" is it fine or should I try something else.

ComputerMaestro commented 5 years ago

Sir , I trained model and saved the weights in bson format , with weights I have to save two more variable tagdict and classes. But there was problem in loading the classes I found that there is some issue in bson for loading Set type variables. Any suggestions what should do ??

aviks commented 5 years ago

Does is need to be a Set? Can you not make it a plain Array? That might be a simple workaround.

ComputerMaestro commented 5 years ago

yes, I did the same. I will commit it with trained weights.

ComputerMaestro commented 5 years ago

I could not upload weigths because the file is big. So I have the link to weights https://drive.google.com/open?id=1AQE5VRmI1qt3UMsxTRmpFcXrpchQRPwp The weights got are not very much accurate becaue they are only trained on 200000 examples I will train and update them more. Please check the averagePerceptronTagger.jl for any suggestion.

ComputerMaestro commented 5 years ago

Sorry sir, by mistake cloased it. I could not upload weights because the file is big in size. So this is the link to the weights https://drive.google.com/open?id=1AQE5VRmI1qt3UMsxTRmpFcXrpchQRPwp There pretrained are not very accurate because they are only trained on 200000 examples. I will update them more . Please check the averagePerceptronTagger.jl, and suggest if any change is to be made. Improved averagePerceptronTagger,.jl is in pending check now.

aviks commented 5 years ago

Hi @yash270200, Thanks, this is good. I have made some suggestions to make this look more idiomatic Julia. Please ask for clarification if any of the feedback is not clear.

ComputerMaestro commented 5 years ago

Thanks sir for your suggestions, I will surely make the changes as soon as possible.

ComputerMaestro commented 5 years ago

Sir I have made the suggested changes. Please have a look and give feedback for any further improvements.

ComputerMaestro commented 5 years ago

Sir, apart from this issue. I am also interested to work on issue #28, in that what interface do you expect for wordnet. Please give some hint so that I can proceed with it as well.

And one more thing, Sir I am making a doc for the average perceptron model. In that, should I explain full functionality of the functions or give examples only.

ComputerMaestro commented 5 years ago

Sir, in appveyor system x86 is not able to install Julia 0.7.0 and hence tests cannot be conducted in x86, so those tests are failing in Julia 0.7.0 . But x64 system the tests are passing in appveyor. Other than this tests in sentiment.jl are failing in x86 systems and passing in x64.

ComputerMaestro commented 5 years ago

Thanks Ayush, I will do the changes by 10th.

aviks commented 5 years ago

@ComputerMaestro This looks generally OK to me. But the tests fail with

Test threw exception UndefVarError(:AveragePerceptron)

Also, we should get some documentation. We have written some great docs for other parts of this package.

ComputerMaestro commented 5 years ago

@aviks , I have fixed that bug. I will complete the documentation as soon as possible.

ComputerMaestro commented 5 years ago

I have the changes @Ayushk4 , Have a look through docs.

Ayushk4 commented 5 years ago

This can be removed with the merging of this PR. https://github.com/JuliaText/TextAnalysis.jl/blob/c6841c6694b33efef36daf49eb8814c7db7190dd/src/preprocessing.jl#L218

ComputerMaestro commented 5 years ago

This can be removed with the merging of this PR.

https://github.com/JuliaText/TextAnalysis.jl/blob/c6841c6694b33efef36daf49eb8814c7db7190dd/src/preprocessing.jl#L218

Yeahh, Removed that

Ayushk4 commented 5 years ago

What should be done about this? https://github.com/JuliaText/TextAnalysis.jl/blob/e52fe6764b1a09144e87fbfce89be13e2224c5f9/src/preprocessing.jl#L6

The function tag_pos! is still being used by prepare!( ) in preprocess.jl and also being exported in TextAnalysis.jl

ComputerMaestro commented 5 years ago

I think @aviks can give a proper solution to this

ComputerMaestro commented 5 years ago

What should be done about this?

https://github.com/JuliaText/TextAnalysis.jl/blob/e52fe6764b1a09144e87fbfce89be13e2224c5f9/src/preprocessing.jl#L6

The function tag_pos! is still being used by prepare!( ) in preprocess.jl and also being exported in TextAnalysis.jl

@aviks , what should be done for this???

aviks commented 5 years ago

@aviks , what should be done for this???

If we think tag_pos should be removed, then change the implementation of prepare! and add a deprecation message to tag_pos.

Ayushk4 commented 5 years ago

src/pretrainedMod.bson is greater than 50MB, I get this warning upon doing the rebase with master -remote: warning: File src/pretrainedMod.bson is 50.83 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB.

Maybe we can store it elsewhere and call using DataDeps when needed.