bnosac / crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
Other
62 stars 12 forks source link

Tutorial #6

Closed fahadshery closed 6 years ago

fahadshery commented 6 years ago

Looks like I hit a jackpot for creating my own models for NER, chunking etc. A small tutorial would be ideal :)

jwijffels commented 6 years ago

Please look at the package vignette.

fahadshery commented 6 years ago

I did look at it. I am still not sure how to open the ShinyApp that comes with the package. (Sorry for being thick) and How to create a training dataset. I ran your examples successfully though

jwijffels commented 6 years ago

Open the file in RStudio and press on the run document button in RStudio.

fahadshery commented 6 years ago

I have located the annotations.Rmd installed shiny and flexdashboard packages. But when I click Run -> Run All in RStudio. It gives an error: Error in output$ui_txt <- renderText({ : object 'output' not found

jwijffels commented 6 years ago

I can't reproduce this error. The app runs fine with the following shiny/flexdashboard version

 installed.packages()[c("shiny", "flexdashboard"), "Version"]         
shiny flexdashboard        
"1.0.5"     "0.5.1.1"
fahadshery commented 6 years ago

My versions are:

> installed.packages()[c("shiny", "flexdashboard"), "Version"] shiny flexdashboard "1.1.0" "0.5.1.1" is there a way to install the similar version as yours?

fahadshery commented 6 years ago

installed same version as yours:devtools::install_version("shiny", version = "1.0.5", repos = "http://cran.us.r-project.org") still getting the erro

jwijffels commented 6 years ago

sure, get the package from cran and install it

jwijffels commented 6 years ago

If you can't manage to install the shiny app, you need to have/create training data in the format shown in the help of ?merge.chunkrange

fahadshery commented 6 years ago

It's definitely not working. I installed it on Macbook and Windows and both giving the same error at output$ui_text <- renderText({) object output not found. I always thought Shiny Apps are built with server.R and ui.R? or am I supposed to open up a different file?

jwijffels commented 6 years ago

works fine and tested on Windows/Mac/Linux. Run as rmarkdown::run(system.file(package = "crfsuite", "app", "annotation.Rmd"))

fahadshery commented 6 years ago

genius! This is why it wasn't working! Now It is up and running! it is time to build custom models... I intend to use your StarSpacy package as well. What's the key difference between the two?

fahadshery commented 6 years ago

I would also suggest to add 2 things in your wiki/intro/ReadMe file:

  1. Add this line in the intro (rmarkdown::run(system.file(package = "crfsuite", "app", "annotation.Rmd"))) if people want to use the app :)
  2. Add confusion matrix instead of table(scores$label) this will show how accurate the model is after training and how good it is performing on the test set. you just have to tweak table(scores$label) to table(train$label,scores$label)
  3. Lastly, for people who can't run the app, just put an example training file. This will help them format their data in the format crfsuite expects. If you can't add an example training_set then just provide a similar format mentioned by openNLP i.e. Mike who is just 10 months old.
  4. He likes to play in the lounge .
  5. Again, thank you for the package. Its truely looking amazing!
juliopiubello commented 6 years ago

as @fahadshery some more information on how to get the data in the format of crsuites expects would be highly appreciated :)

sirhcdlanodcm commented 6 years ago

Just a heads up - there's a small typo in the code to run the shiny app in the vignette:

And run the app with rmarkdown::run(file = system.file(package = "crfsuite", "app", "annnotation.Rmd"))

Note the 3 n's in "annotation.Rmd". Barely worth mentioning - but for people like me who like to copy and paste without reading, it might be a nice fix ;)

Super excited to explore this package!

jwijffels commented 6 years ago

the crfsuite R package requires you to have a tokenised data.frame (1 row per token, in the right sequence of token occurrence) containing:

Full copy-paste reproducible examples (aprroximately 7) of these data structures are provided in

May I ask you to first look there before raising issues.