facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 720 forks source link

Support for Indian languages and documentation help #677

Open shubhamchaurasia1 opened 2 years ago

shubhamchaurasia1 commented 2 years ago

Can somebody please guide me through the logic behind ducking? I am trying to search for the documentation. I want to add support for Indian languages and rewrite the logic for it in python. For me, the most important use case is to extract time from any text.

stroxler commented 2 years ago

Hi @shubhamchaurasia1 - I can try to collect some resources for how to get started working on Duckling language models in the next 2-3 weeks.

Unfortunately we don't have much documentation right now, if you want to get started right of your best bet is to look at some existing Rules.hs to get a sense how the rules are written. The English language support is probably the most mature, that might be a good place to start.

shubhamchaurasia1 commented 2 years ago

Thank you @stroxler - It would be great help if you could help me with some resources. Surely I will start with the Rules.hs file to get the understanding of written rules.

shubhamchaurasia1 commented 2 years ago

Hi @stroxler - I read the rules.hs and corpus.hs files to get the understanding of written rules for different dimensions. However, I am still unable to figure out how the classifier is being used in extracting the entities.

Can I get some basic idea about the flow of the project? Like how the training happens for a dimension and how duckling employs classifiers?

stroxler commented 2 years ago

The training happens out-of-band - there's an executable that will use the training corpus to fit a very simple statistical model, and re-generate source files that include hardcoded weights.

I've never tried rebuilding classifiers from the open-source repo, but you ought to be able to do it using the command stack build :duckling-regen-exe. If you want to see what's going on under the hood you can trace that down, the command is defined in duckling.cabal; at a high level it will run RegenMain.hs, which fits a Naive Bayes model that we use for ambiguous parses.

The README recommends running this to do an end-to-end test of changes that could alter classifier outputs:

stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test
stroxler commented 2 years ago

I did confirm that running

stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

on my laptop seems to work alright, I think this is all you should need.

I believe there is a way to regenerate for just one dimension + language, which would be much faster if you need to make a series of updates (usually this isn't necessary). It probably requires manually running a command from stack repl. But it's been a while, I would have to dig around to find the right command.

chessai commented 2 years ago
$ stack repl
> :l Duckling.Ranking.Generate
> regenLangClassifiers <LANG>

On Wed, Feb 16, 2022, 19:31 Steven Troxler @.***> wrote:

I did confirm that running

stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

on my laptop seems to work alright, I think this is all you should need.

I believe there is a way to regenerate for just one dimension + language if you need, probably by manually running a command from stack repl. But it's been a while, I would have to dig around to find the right command.

— Reply to this email directly, view it on GitHub https://github.com/facebook/duckling/issues/677#issuecomment-1042478729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOIX26ITZ2UFUB7JF5F6VLU3RFXLANCNFSM5NPHN4QQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

shubhamchaurasia1 commented 2 years ago

Hi, I am trying run debugger, but I am unable to run config the debugger json file for it. Can someone please help me or share how to set it up?

In the below json file I put stack ghci exe/RegenMain.hs as the ghciCmd as I want to run this file. But whenever I try to run debugger using below json file, debugger starts and stops instantly without any results. If possible, can someone please help me on what changes should I do to run the debugger?

Current launch.json settings:


{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [

        {
            "type": "ghc",
            "request": "launch",
            "name": "haskell(stack)",
            "internalConsoleOptions": "openOnSessionStart",
            "workspace": "${workspaceFolder}",
            "startup": "${workspaceFolder}/exe/RegenMain.hs",
            "startupFunc": "",
            "startupArgs": "",
            "stopOnEntry": false,
            "mainArgs": "",
            "ghciPrompt": "H>>= ",
            "ghciInitialPrompt": "> ",
            "ghciCmd": "stack ghci exe/RegenMain.hs",
            "ghciEnv": {},
            "logFile": "${workspaceFolder}/.vscode/phoityne.log",
            "logLevel": "WARNING",
            "forceInspect": false
        }
    ]
}
shubhamchaurasia1 commented 2 years ago

Hi, I am trying run debugger, but I am unable to run config the debugger json file for it. Can someone please help me or share how to set it up?

In the below json file I put stack ghci exe/RegenMain.hs as the ghciCmd as I want to run this file. But whenever I try to run debugger using below json file, debugger starts and stops instantly without any results. If possible, can someone please help me on what changes should I do to run the debugger?

Current launch.json settings:


{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [

        {
            "type": "ghc",
            "request": "launch",
            "name": "haskell(stack)",
            "internalConsoleOptions": "openOnSessionStart",
            "workspace": "${workspaceFolder}",
            "startup": "${workspaceFolder}/exe/RegenMain.hs",
            "startupFunc": "",
            "startupArgs": "",
            "stopOnEntry": false,
            "mainArgs": "",
            "ghciPrompt": "H>>= ",
            "ghciInitialPrompt": "> ",
            "ghciCmd": "stack ghci exe/RegenMain.hs",
            "ghciEnv": {},
            "logFile": "${workspaceFolder}/.vscode/phoityne.log",
            "logLevel": "WARNING",
            "forceInspect": false
        }
    ]
}
stroxler commented 2 years ago

Unfortunately I'm not familiar with vscode debugging of Haskell code, you'd probably have to find a foruim where there are Haskell stack experts.

For what it's worth, when developing new rules for Duckling I have mostly just relied on the interactive capabilities in ghci - Duckling has built-in support for "debug output" which will help you visualize the parse tree and the rules that ran when interpreting any given output.

For example:

[](https://github.com/facebook/duckling/blob/main/README.md#license)$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]

As a rule I'd say this kind of debugging is likely to get you further than a debugger, assuming that you're trying to develop the rules as opposed to work on the engine internals.

shubhamchaurasia1 commented 2 years ago

This debugger is not working for me as I am trying to work on the engine internals and analysis the classifier.

These details will help me and other contributors as well to get the detailed understanding of the whole process.

stroxler commented 2 years ago

The full details will definitely require some digging.

In case it may help unblock you, I posted some notes I took on Duckling internals last year at https://gist.github.com/stroxler/1187695c98e94b0f3ea7dbc1efadf0a8

I'm hoping to get these into the Duckling source code at some point, but I'm not if or when that will happen

Hopefully this helps