facebookresearch / Clinical-Trial-Parser

Library for converting clinical trial eligibility criteria to a machine-readable format.
Apache License 2.0
163 stars 58 forks source link

improve cfg_parse.sh result #5

Open mlaria opened 4 years ago

mlaria commented 4 years ago

is there a way to configure and improve cfg_parse result? At the moment with my test data set it is does not seem stable enough to parse a simple criteria such as age. For example, sometime it detects the age criteria and sometimes it doesn't, even when the text look straightforward like the following:

eligibility_critera
Inclusion Criteria: - Adult patients, 18-75 years of age.

My results seem to show that cfg_parse misses more than 50% of the eligibility criteria items even with the example csv in this repository. Perhaps I'm doing something wrong. Can anyone help me understand cfg_parse result better? Thanks!

salkola commented 4 years ago

The CFG parser can be improved by updating the grammar production rules. A criterion example that you want to be able to parse can be added to interpreter_test.go as a new test case.

We prefer to bias the CFG parser toward higher precision than recall, because the likelihood of falsely rejecting an eligible participant is lower.

Although not related to parsing eligibility criteria text, note that the eligibilities table has the minimum and maximum age limits of the clinical trials.

c370300679 commented 2 years ago

@salkola In production rule, what's the meaning of the letters in nonterminals?

S -> C C -> C X | R X -> O R | R R -> V A | A V | V V -> V1 V2 | V1 V2 -> H V1 A -> L Y | Y Y | B W | B B | B | E E -> E N | E Z | N Z -> O N B -> T L | L T W -> O B L -> N U | N Y -> D L

I tried to deduce but failed. Could you please give some instruction telling what the definition of each letter? Many Thanks!