facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

Incorrect detections on sequential dates #573

Closed arthurthlee closed 3 years ago

arthurthlee commented 3 years ago

Hello, Duckling seems to return incorrect results when multiple dates are beside one another, in addition to numerical non-dates.

Ex. 2.04.123 Sept. 1, 2020 Aug. 20, 2020 N/A Expected: Sept. 1, 2020 Aug. 20, 2020 Actual: Aug. 20, 2020

Ex. 5/11/2020 11/21/2019 gets parsed Expected: 5/11/2020 11/21/2019 Actual: 11/2020 11/21

chessai commented 3 years ago

Can you share how you're inputting these? On the example webserver? Via wit.ai? I can't reproduce.

First example:

*Duckling.Core Duckling.Debug> debug (makeLocale EN Nothing) "Sept. 1, 2020 Aug. 20, 2020" [Seal Time]
intersect by ",", "of", "from" for year (Sept. 1, 2020)
-- <named-month> <day-of-month> (non ordinal) (Sept. 1)
-- -- September (Sept.)
-- -- -- regex (Sept.)
-- -- integer (numeric) (1)
-- -- -- regex (1)
-- regex (,)
-- year (latent) (2020)
-- -- integer (numeric) (2020)
-- -- -- regex (2020)
intersect by ",", "of", "from" for year (Aug. 20, 2020)
-- <named-month> <day-of-month> (non ordinal) (Aug. 20)
-- -- August (Aug.)
-- -- -- regex (Aug.)
-- -- integer (numeric) (20)
-- -- -- regex (20)
-- regex (,)
-- year (latent) (2020)
-- -- integer (numeric) (2020)
-- -- -- regex (2020)
[Entity {dim = "time", body = "Sept. 1, 2020", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2020-09-01 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2020-09-01 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 13, latent = False, enode = Node {nodeRange = Range 0 13, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 7, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 9}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "September"},Node {nodeRange = Range 6 7, token = Token Numeral (NumeralData {value = 1.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 6 7, token = Token RegexMatch (GroupMatch ["1"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month> (non ordinal)"},Node {nodeRange = Range 7 8, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 9 13, token = Token Time TimeData{latent=True, grain=Year, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 13, token = Token Numeral (NumeralData {value = 2020.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 9 13, token = Token RegexMatch (GroupMatch ["2020"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "year (latent)"}], rule = Just "intersect by \",\", \"of\", \"from\" for year"}},Entity {dim = "time", body = "Aug. 20, 2020", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2020-08-20 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2020-08-20 00:00:00 -0200, vGrain = Day})] Nothing), start = 14, end = 27, latent = False, enode = Node {nodeRange = Range 14 27, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 14 21, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 14 18, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 8}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 14 18, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "August"},Node {nodeRange = Range 19 21, token = Token Numeral (NumeralData {value = 20.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 19 21, token = Token RegexMatch (GroupMatch ["20"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month> (non ordinal)"},Node {nodeRange = Range 21 22, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 23 27, token = Token Time TimeData{latent=True, grain=Year, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 23 27, token = Token Numeral (NumeralData {value = 2020.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 23 27, token = Token RegexMatch (GroupMatch ["2020"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "year (latent)"}], rule = Just "intersect by \",\", \"of\", \"from\" for year"}}]
*Duckling.Core Duckling.Debug> length it
2

(two distinct times)

Second example:

*Duckling.Core Duckling.Debug> debug (makeLocale EN Nothing) "5/11/2020 11/21/2019" [Seal Time]
mm/dd/yyyy (5/11/2020)
-- regex (5/11/2020)
intersect (11/2020 11/21)
-- mm/yyyy (11/2020)
-- -- regex (11/2020)
-- mm/dd (11/21)
-- -- regex (11/21)
mm/dd/yyyy (11/21/2019)
-- regex (11/21/2019)
[Entity {dim = "time", body = "5/11/2020", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2020-05-11 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2020-05-11 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 9, latent = False, enode = Node {nodeRange = Range 0 9, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 9, token = Token RegexMatch (GroupMatch ["5","11","2020"]), children = [], rule = Nothing}], rule = Just "mm/dd/yyyy"}},Entity {dim = "time", body = "11/2020 11/21", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2020-11-21 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2020-11-21 00:00:00 -0200, vGrain = Day})] Nothing), start = 2, end = 15, latent = False, enode = Node {nodeRange = Range 2 15, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 2 9, token = Token Time TimeData{latent=False, grain=Month, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 2 9, token = Token RegexMatch (GroupMatch ["11","2020"]), children = [], rule = Nothing}], rule = Just "mm/yyyy"},Node {nodeRange = Range 10 15, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 10 15, token = Token RegexMatch (GroupMatch ["11","21"]), children = [], rule = Nothing}], rule = Just "mm/dd"}], rule = Just "intersect"}},Entity {dim = "time", body = "11/21/2019", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2019-11-21 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2019-11-21 00:00:00 -0200, vGrain = Day})] Nothing), start = 10, end = 20, latent = False, enode = Node {nodeRange = Range 10 20, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 10 20, token = Token RegexMatch (GroupMatch ["11","21","2019"]), children = [], rule = Nothing}], rule = Just "mm/dd/yyyy"}}]
*Duckling.Core Duckling.Debug> length it
3

(three distinct times)

arthurthlee commented 3 years ago

On both the webserver and on wit.ai. image

Sorry, I forgot to mention that there are preceding numerical non-dates, ie. the preceding "2.04.123" in the first example (which isn't a date, just a random numerical code in a medical document)

For the second example, the full document is below:

arthurthlee commented 3 years ago

[PHARMACY] COVERAGE GUIDELINES SECTION: DRUGS

ORIGINAL EFFECTIVE DATE: LAST REVIEW DATE: LAST CRITERIA REVISION DATE: ARCHIVE DATE:

10/1/2013 5/11/2020 11/21/2019

NON-PREFERRED BLOOD GLUCOSE METER TEST STRIPS Coverage for services, procedures, medical devices and drugs are dependent upon benefit eligibility as outlined in the member's specific benefit plan. This Pharmacy Coverage Guideline must be read in its entirety to determine coverage eligibility, if any. This Pharmacy Coverage Guideline provides information related to coverage determinations only and does not imply that a service or treatment is clinically appropriate or inappropriate. The provider and the member are responsible for all decisions regarding the appropriateness of care. Providers should provide BCBSAZ complete medical rationale when requesting any exceptions to these guidelines. The section identified as “Description” defines or describes a service, procedure, medical device or drug and is in no way intended as a statement of medical necessity and/or coverage. The section identified as “Criteria” defines criteria to determine whether a service, procedure, medical device or drug is considered medically necessary or experimental or investigational. State or federal mandates, e.g., FEP program, may dictate that any drug, device or biological product approved by the U.S. Food and Drug Administration (FDA) may not be considered experimental or investigational and thus the drug, device or biological product may be assessed only on the basis of medical necessity. Pharmacy Coverage Guidelines are subject to change as new information becomes available. For purposes of this Pharmacy Coverage Guideline, the terms "experimental" and "investigational" are considered to be interchangeable. BLUE CROSS®, BLUE SHIELD® and the Cross and Shield Symbols are registered service marks of the Blue Cross and Blue Shield Association, an association of independent Blue Cross and Blue Shield Plans. All other trademarks and service marks contained in this guideline are the property of their respective owners, which are not affiliated with BCBSAZ. This Pharmacy Coverage Guideline does not apply to FEP or other states’ Blues Plans. Information about medications that require precertification is available at www.azblue.com/pharmacy. Some large (100+) benefit plan groups may customize certain benefits, including adding or deleting precertification requirements. All applicable benefit plan provisions apply, e.g., waiting periods, limitations, exclusions, waivers and benefit maximums. Precertification for medication(s) or product(s) indicated in this guideline requires completion of the request form in its entirety with the chart notes as documentation. All requested data must be provided. Once completed the form must be signed by the prescribing provider and faxed back to BCBSAZ Pharmacy Management at (602) 864-3126 or emailed to Pharmacyprecert@azblue.com. Incomplete forms or forms without the chart notes will be returned. Page 1 of 5

PHARMACY COVERAGE GUIDELINES SECTION: DRUGS

ORIGINAL EFFECTIVE DATE: LAST REVIEW DATE: LAST CRITERIA REVISION DATE: ARCHIVE DATE:

10/1/2013 5/11/2020 11/21/2019

NON-PREFERRED BLOOD GLUCOSE METER TEST STRIPS BCBSAZ covers all Lifescan™ BGM test strips with applicable quantity level limitations without prior authorization, all other BGM test strips require precertification and approval is based on medical necessity. Criteria:  Criteria for initial therapy: Non-Lifescan FDA-approved blood glucose meter strips is considered medically necessary and will be approved with ONE of the following:

  1. Medical record documentation of BOTH of the following:  Member is unable to use the preferred Lifescan product for testing blood sugar due to additional features not found on preferred product or is unable to use features of the preferred product  Member is unable to use BOTH: • Dexcom G6 Continuous Glucose Monitor AND • Freestyle Libre Pro Flash Continuous Glucose Monitoring System
  2. Member uses an insulin pump system where the pump is controlled by a non-preferred meter that requires a specific non-Lifescan glucose meter strip AND member is unable to use ONE of the following:  Dexcom G6 Continuous Glucose Monitor  Freestyle Libre Pro Flash Continuous Glucose Monitoring System Initial approval duration: 12 months  Criteria for continuation of coverage (renewal request): Non-Lifescan FDA-approved blood glucose meter strips is considered medically necessary and will be approved with ONE of the following:
  3. Individual continues to need a non-preferred Lifescan product that has additional features not found on preferred product or is unable to use features of the preferred product and member is unable to use BOTH:  Dexcom G6 Continuous Glucose Monitor AND  Freestyle Libre Pro Flash Continuous Glucose Monitoring System
  4. Continue use of an insulin pump that is controlled by a non-preferred meter that requires a specific nonLifescan glucose meter strip and member is unable to use ONE of the following:  Dexcom G6 continuous glucose monitor (CGM) OR Freestyle Libre Pro Flash Continuous Glucose Monitoring System  If they are already using another CGMS system with their insulin pump Renewal duration: 12 months

Description: Regular glucose monitoring is one way people with diabetes can learn more about their condition that allows them to make important decisions about medication dosage, exercise, and diet. Page 2 of 5

PHARMACY COVERAGE GUIDELINES SECTION: DRUGS

ORIGINAL EFFECTIVE DATE: LAST REVIEW DATE: LAST CRITERIA REVISION DATE: ARCHIVE DATE:

10/1/2013 5/11/2020 11/21/2019

NON-PREFERRED BLOOD GLUCOSE METER TEST STRIPS Keeping track of blood glucose values can help to manage this condition. Testing blood sugar level is one of the best ways to understand diabetes and how different foods, medications, and activities affect diabetes. Individuals with diabetes can use portable blood glucose meters (BGM), to check blood sugar levels by analyzing a small amount of blood from a fingertip. A lancet is used to prick the skin to obtain the blood. The blood goes on a glucose test strip that is then inserted into the glucose meter. There are numerous commercially available BCMs to choose from; they differ in several ways that can include: amount of blood needed for each test, testing speed, overall size and weight of meter, meters ability to store test results in memory, cost of the meter, and cost of the test strips used. Continuous glucose monitoring of the interstitial fluid is another technique of automatically measuring glucose levels throughout the day to provide trends in glucose measurements. In contrast to traditional isolated blood glucose levels, these monitors test glucose levels without routine finger pricks and automatically measure glucose readings throughout the day. Finger prick glucose readings and readings from interstitial fluid will not match but the readings from interstitial fluid have been proven to reliably reflect blood glucose levels. Interstitial glucose readings are sometimes referred to as sensor glucose readings. Interstitial monitors may be integrated (combined) with external insulin infusion pumps or non-integrated. According to the FDA labeling, monitors are not intended to be an alternative to traditional self-monitoring of blood glucose levels but rather provide adjunct monitoring, supplying additional information on glucose trends that are not available from self-monitoring. Tight glucose control in patients with diabetes has been associated with improved health outcomes. Several devices are available to measure glucose levels automatically and frequently (e.g., every 5-10 minutes). Devices that measure glucose in the interstitial fluid are approved as adjuncts to traditional self-monitoring of blood glucose levels. These devices can be used on an intermittent (short-term) basis or a continuous (long-term) basis. Several insulin pump systems have a built-in continuous glucose monitor (CGM). Insulin pumps systems with a built-in CGM and a low-glucose suspend (LGS) feature are also available and are referred to as artificial pancreas device systems. Examples include: Accu-Chek® Combo System, Medtronic™ MiniMed™ 530G, 630G and 670G, and OmniPod® Insulin Management System. BCBSAZ covers all Lifescan™ BGM test strips with applicable quantity level limitations without precertification, all other strips require precertification and approval is based on medical necessity. Members and their providers are NOT required to participate in the exception precertification program when obtaining BCBSAZ preferred Lifescan BGM test strips. For non-preferred BGM test strips members and their prescribers will be required to participate in the precertification process and the prescriber must complete a precertification request form with information regarding the medical necessity for use of another brand of blood glucose meter test strip. There are several meter characteristic may influence the choice of the type of device and includes: whole blood versus plasma glucose concentration results – whole blood is 10-15% lower than plasma (lab results), measurement range, ability to download results to a computer or internet to the provider, ability to average Page 3 of 5

PHARMACY COVERAGE GUIDELINES SECTION: DRUGS

ORIGINAL EFFECTIVE DATE: LAST REVIEW DATE: LAST CRITERIA REVISION DATE: ARCHIVE DATE:

10/1/2013 5/11/2020 11/21/2019

NON-PREFERRED BLOOD GLUCOSE METER TEST STRIPS glucose values, whether it also measures ketones, effect of altitude on meter function, ability to use alternate site testing, audio capabilities, backlighting, temperature (high and low) range at which the meter will work, ease of use (number of steps required), other dexterity issues for patients with severe arthritis, need for a user code, when and how to calibrate, display size, meter cleaning requirements, cost/ease of replacement of batteries, and availability of customer support. According to the U.S. Food and Drug Administration (FDA), an artificial pancreas is a medical device that links a glucose monitor to an insulin infusion pump, and the pump automatically reduces and increases insulin delivery according to measured subcutaneous glucose levels using a control algorithm. Because control algorithms can vary significantly, there are a variety of artificial pancreas device systems currently under development. These systems span a wide range of designs from a low-glucose suspend (LGS) device systems to the more complex bi-hormonal control-to-target systems. There are 3 main categories of artificial pancreas device systems: 1) threshold suspend device, 2) control-to-range, and 3) control-to-target systems. With threshold suspend device systems, also called LGS systems, the delivery of insulin is suspended for a set time when 2 glucose levels are below a specified low level indicating hypoglycemia. Using control-to-range systems, the patient sets his or her own insulin dosing within a specified range, but the artificial pancreas device system takes over if glucose levels outside that range (higher or lower). Patients using this type of system still need to check blood glucose levels and administer insulin as needed. With control-to-target systems, the device aims to maintain glucose levels near a target level (e.g., 100 mg/dL). Control-to-target systems are automated and do not require user participation except to calibrate the CGM system. Several device subtypes are being developed: those that deliver insulin-only, bi-hormonal systems, and hybrid systems. Continuous Glucose Monitors Devices Include: ▪ Continuous Glucose Recorder Monitoring System (CGMS®) (Medtronic, MiniMed) ▪ Dexcom® STS CGMS ▪ Dexcom™ STS-7™ CGMS ▪ Dexcom® G4 Platinum CGMS ▪ Dexcom® G5 Mobile CGMS ▪ Dexcom® G6 CGMS ▪ FreeStyle™ Navigator® CGMS (Abbott) ▪ Guardian® RT (Real-Time) CGMS (Medtronic, MiniMed) ▪ Freestyle™ Libre® Pro Flash Continuous Glucose Monitoring System FDA Approved Combined CGM with External Insulin Pump Devices Include: ▪ Paradigm® REAL-Time System (second generation called Paradigm Revel System) (Medtronic, MiniMed) for age seven years and older. FDA Approved Artificial Pancreas Device Systems: ▪ MiniMed® 530G System: Page 4 of 5

PHARMACY COVERAGE GUIDELINES SECTION: DRUGS

ORIGINAL EFFECTIVE DATE: LAST REVIEW DATE: LAST CRITERIA REVISION DATE: ARCHIVE DATE:

10/1/2013 5/11/2020 11/21/2019

NON-PREFERRED BLOOD GLUCOSE METER TEST STRIPS Is classified as a LGS or threshold suspend device system which integrates an insulin pump and glucose meter. The threshold suspend tool temporarily suspends insulin delivery when the sensor glucose level is equal to or lower than a pre-set threshold within the 60-90 mg/dL range. When the glucose value reaches this threshold, an alarm sounds. If individuals respond to the alarm, they can choose to continue or cancel the insulin suspend feature. If individuals fail to respond to the alarm, the pump automatically suspends action for 2 hours, and then insulin therapy resumes. The system is approved for use in individuals 16 years and older. ▪ MiniMed® 630G System with SmartGuard™: LGS or threshold suspend device system that is similar to the 530G but offers updates to the system components including waterproofing. The system is approved for use in individuals 16 years and older. ▪ MimiMed® 670G System: Is classified as a hybrid closed-loop insulin delivery system. It consists of an insulin pump, a glucose meter and a transmitter, linked by a proprietary algorithm, the SmartGuard HCL. The system includes an LGS feature that suspends insulin delivery when glucose levels get low and has an optional alarm. Additionally, the system involves semi-automatic insulin level adjustment to preset targets. Basal insulin levels are automatically adjusted but the individual needs to administer pre-meal insulin boluses. The system is approved for use in individuals 14 years and older with type 1 diabetes. It is contraindicated in children under age 7 and in individuals who require less than a total daily insulin dose of 8 units. Non-FDA Approved Devices Include: ▪ Eversense™ Implantable CGM System: has been investigated to continually measure interstitial fluid glucose levels in adults with diabetes. System includes transmitter with alert feature and a sensor that is implanted subcutaneously for up to 90 days.

Page 5 of 5

chessai commented 3 years ago

Can you show duckling, not wit, showing incorrect results?

chessai commented 3 years ago

Ah, you did share the duckling.wit.ai. It's out of date by a fair bit. You'll want to use either the example webserver here if interacting with duckling directly. I think the duckling backing wit is the most recent one though. Will have to get back to you on that.

chessai commented 3 years ago

see #589