clulab / processors

Natural Language Processors
https://clulab.github.io/processors/
417 stars 101 forks source link

Debug some more #793

Closed kwalcock closed 3 months ago

kwalcock commented 3 months ago

Always indicate next posId

kwalcock commented 3 months ago

@navalani, before this change the output for the new verb-tense rule was a bit hard to follow and 6 was missing:

There was an extractor: verb-tense
1. SaveStart(--GLOBAL--)
2. MatchToken(B-PER matches John) -> 3
3. Split.  Check out my LHS and RHS!
   (LHS)   4. MatchToken(I-PER) -> 3
   (RHS)   5. Pass
7. Split.  Check out my LHS and RHS!
      (LHS)8. Split.  Check out my LHS and RHS!
         (LHS)         9. MatchToken(ate) -> 0
         0. Done
         (RHS)         10. MatchToken(eats matches eats) -> 0
         0. Done
      (RHS)      11. MatchToken(eating) -> 0
      0. Done
   12. MatchToken(B-FOOD) -> 13
13. Split.  Check out my LHS and RHS!
      (LHS)      14. MatchToken(I-FOOD) -> 13
      (RHS)      15. Pass
      16. SaveEnd(--GLOBAL--)
      0. Done

With this change, it seems a little more understandable.

There was an extractor: verb-tense
1. SaveStart(--GLOBAL--) -> 2
2. MatchToken(B-PER matches John) -> 3
3. Split.  Check out my LHS and RHS!
   (LHS) 4. MatchToken(I-PER) -> 3
   (RHS) 5. Pass -> 6
   (RHS) 6. MatchLookAhead.  Check out my start. -> 12
      (Start) 7. Split.  Check out my LHS and RHS!
         (LHS) 8. Split.  Check out my LHS and RHS!
            (LHS) 9. MatchToken(ate) -> 0
            (LHS) 0. Done
            (RHS) 10. MatchToken(eats matches eats) -> 0
            (RHS) 0. Done
         (RHS) 11. MatchToken(eating) -> 0
         (RHS) 0. Done
   (RHS) 12. MatchToken(B-FOOD) -> 13
   (RHS) 13. Split.  Check out my LHS and RHS!
      (LHS) 14. MatchToken(I-FOOD) -> 13
      (RHS) 15. Pass -> 16
      (RHS) 16. SaveEnd(--GLOBAL--) -> 0
      (RHS) 0. Done

It seems like a Done inside a MatchLookAhead might be interpreted differently than a Done elsewhere. I think the lookahead might start a new thread and the Done just finishes that thread and then things move on from the lookahead's next. To me this makes the case that looking at more examples is useful.

navalani commented 3 months ago

Got it. Yeah I agree that we need more examples to test the visualization.

On Tue, Mar 26, 2024 at 4:34 PM Keith Alcock @.***> wrote:

External Email

@navalani https://github.com/navalani, before this change the output for the new verb-tense rule was a bit hard to follow and 6 was missing:

There was an extractor: verb-tense

  1. SaveStart(--GLOBAL--)
  2. MatchToken(B-PER matches John) -> 3
  3. Split. Check out my LHS and RHS! (LHS) 4. MatchToken(I-PER) -> 3 (RHS) 5. Pass
  4. Split. Check out my LHS and RHS! (LHS)8. Split. Check out my LHS and RHS! (LHS) 9. MatchToken(ate) -> 0
    1. Done (RHS) 10. MatchToken(eats matches eats) -> 0
    2. Done (RHS) 11. MatchToken(eating) -> 0
      1. Done
      2. MatchToken(B-FOOD) -> 13
  5. Split. Check out my LHS and RHS! (LHS) 14. MatchToken(I-FOOD) -> 13 (RHS) 15. Pass
    1. SaveEnd(--GLOBAL--)
    2. Done

With this change, it seems a little more understandable.

There was an extractor: verb-tense

  1. SaveStart(--GLOBAL--) -> 2
  2. MatchToken(B-PER matches John) -> 3
  3. Split. Check out my LHS and RHS! (LHS) 4. MatchToken(I-PER) -> 3 (RHS) 5. Pass -> 6 (RHS) 6. MatchLookAhead. Check out my start. -> 12 (Start) 7. Split. Check out my LHS and RHS! (LHS) 8. Split. Check out my LHS and RHS! (LHS) 9. MatchToken(ate) -> 0 (LHS) 0. Done (RHS) 10. MatchToken(eats matches eats) -> 0 (RHS) 0. Done (RHS) 11. MatchToken(eating) -> 0 (RHS) 0. Done (RHS) 12. MatchToken(B-FOOD) -> 13 (RHS) 13. Split. Check out my LHS and RHS! (LHS) 14. MatchToken(I-FOOD) -> 13 (RHS) 15. Pass -> 16 (RHS) 16. SaveEnd(--GLOBAL--) -> 0 (RHS) 0. Done

It seems like a Done inside a MatchLookAhead might be interpreted differently than a Done elsewhere. I think the lookahead might start a new thread and the Done just finishes that thread and then things move on from the lookahead's next. To me this makes the case that looking at more examples is useful.

— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/pull/793#issuecomment-2021646291, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCCR2VK7OOCWHSJQOI2FFHDY2IAXVAVCNFSM6AAAAABFJ3BJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGY2DMMRZGE . You are receiving this because you were mentioned.Message ID: @.***>

navalani commented 3 months ago

Hello Keith,

I was trying to report the text character positions of rule matches and wanted to use the tok field. I noticed it is a field in the singleThread but need some help on how I would use singleThread to visualize the tok field.

Thanks, Nick

On Tue, Mar 26, 2024 at 4:36 PM Nick Avalani @.***> wrote:

Got it. Yeah I agree that we need more examples to test the visualization.

On Tue, Mar 26, 2024 at 4:34 PM Keith Alcock @.***> wrote:

External Email

@navalani https://github.com/navalani, before this change the output for the new verb-tense rule was a bit hard to follow and 6 was missing:

There was an extractor: verb-tense

  1. SaveStart(--GLOBAL--)
  2. MatchToken(B-PER matches John) -> 3
  3. Split. Check out my LHS and RHS! (LHS) 4. MatchToken(I-PER) -> 3 (RHS) 5. Pass
  4. Split. Check out my LHS and RHS! (LHS)8. Split. Check out my LHS and RHS! (LHS) 9. MatchToken(ate) -> 0
    1. Done (RHS) 10. MatchToken(eats matches eats) -> 0
    2. Done (RHS) 11. MatchToken(eating) -> 0
      1. Done
      2. MatchToken(B-FOOD) -> 13
  5. Split. Check out my LHS and RHS! (LHS) 14. MatchToken(I-FOOD) -> 13 (RHS) 15. Pass
    1. SaveEnd(--GLOBAL--)
    2. Done

With this change, it seems a little more understandable.

There was an extractor: verb-tense

  1. SaveStart(--GLOBAL--) -> 2
  2. MatchToken(B-PER matches John) -> 3
  3. Split. Check out my LHS and RHS! (LHS) 4. MatchToken(I-PER) -> 3 (RHS) 5. Pass -> 6 (RHS) 6. MatchLookAhead. Check out my start. -> 12 (Start) 7. Split. Check out my LHS and RHS! (LHS) 8. Split. Check out my LHS and RHS! (LHS) 9. MatchToken(ate) -> 0 (LHS) 0. Done (RHS) 10. MatchToken(eats matches eats) -> 0 (RHS) 0. Done (RHS) 11. MatchToken(eating) -> 0 (RHS) 0. Done (RHS) 12. MatchToken(B-FOOD) -> 13 (RHS) 13. Split. Check out my LHS and RHS! (LHS) 14. MatchToken(I-FOOD) -> 13 (RHS) 15. Pass -> 16 (RHS) 16. SaveEnd(--GLOBAL--) -> 0 (RHS) 0. Done

It seems like a Done inside a MatchLookAhead might be interpreted differently than a Done elsewhere. I think the lookahead might start a new thread and the Done just finishes that thread and then things move on from the lookahead's next. To me this makes the case that looking at more examples is useful.

— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/pull/793#issuecomment-2021646291, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCCR2VK7OOCWHSJQOI2FFHDY2IAXVAVCNFSM6AAAAABFJ3BJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGY2DMMRZGE . You are receiving this because you were mentioned.Message ID: @.***>

kwalcock commented 3 months ago

@navalani, be sure to look at the kwalcock/debugContext branch, especially at Evaluator.mkThreads. My understanding is that we're dealing with an NDFA, non-deterministic finite (state) automaton. Each thread is like an alternative reality for a different path through the rules and tokens. An Inst can be the current Inst of numerous threads that are each being processed against a maybe different tok (index of the token/word in the sentence). The Inst unfortunately does not know about all the threads it's in and therefore not the relevant toks. We could change that, but it's very tricky. (It gets mixed up with issues of identity and equality.) Instead, I've had something else keep track of the Inst/tok pairings, at least if a thread gets far enough to use an Inst with a tok. That thing is the transcript in the Debugger which stores DebuggerRecords. You could query the transcript to see where all an Inst has matched (or not matched). There are a couple of examples in the OdinStarter in that branch. Something like this seems to be necessary.

Does the question mean that you have a substantial number of examples in TestTokenExtractorDebugger?

navalani commented 3 months ago

Thanks, I will work on that. I've added four more examples in TestTokenExtractorDebugger which I have pushed to the repo.

Nick

On Tue, Apr 2, 2024 at 4:04 PM Keith Alcock @.***> wrote:

External Email

@navalani https://github.com/navalani, be sure to look at the kwalcock/debugContext https://github.com/clulab/processors/tree/kwalcock/debugContext branch, especially at Evaluator.mkThreads. My understanding is that we're dealing with an NDFA, non-deterministic finite (state) automaton. Each thread is like an alternative reality for a different path through the rules and tokens. An Inst can be the current Inst of numerous threads that are each being processed against a maybe different tok (index of the token/word in the sentence). The Inst unfortunately does not know about all the threads it's in and therefore not the relevant toks. We could change that, but it's very tricky. (It gets mixed up with issues of identity and equality.) Instead, I've had something else keep track of the Inst/tok pairings, at least if a thread gets far enough to use an Inst with a tok. That thing is the transcript in the Debugger which stores DebuggerRecords. You could query the transcript to see where all an Inst has matched (or not matched). There are a couple of examples in the OdinStarter in that branch. Something like this seems to be necessary.

Does the question mean that you have a substantial number of examples in TestTokenExtractorDebugger?

— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/pull/793#issuecomment-2033258472, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCCR2VL7TNY6KYCSQAZXBDDY3M2O5AVCNFSM6AAAAABFJ3BJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGI2TQNBXGI . You are receiving this because you were mentioned.Message ID: @.***>

kwalcock commented 3 months ago

@navalani, can you look through the paper and try to get all the operators or whatever they are called? I don't see ?!, ?<!, {n}, {,m}, {n,}, or some of their lazy forms, for example. We're interested now in what kind of graph of Insts these create and will want to make sure at some point that we can identify the part of the rule where these are written.

navalani commented 3 months ago

Sure, I'll check it out.

On Wed, Apr 3, 2024 at 9:22 AM Keith Alcock @.***> wrote:

External Email

@navalani https://github.com/navalani, can you look through the paper https://arxiv.org/pdf/1509.07513 and try to get all the operators or whatever they are called? I don't see ?!, ?<!, {n}, {,m}, {n,}, or some of their lazy forms, for example. We're interested now in what kind of graph of Insts these create and will want to make sure at some point that we can identify the part of the rule where these are written.

— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/pull/793#issuecomment-2035043930, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCCR2VLW25P47APD24OJ63TY3QUFBAVCNFSM6AAAAABFJ3BJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGA2DGOJTGA . You are receiving this because you were mentioned.Message ID: @.***>