added case class and pruning by threshold

clulab / eidos

Machine reading system for World Modelers

Apache License 2.0

37 stars 24 forks source link

added case class and pruning by threshold #1004

Closed BeckySharp closed 3 years ago

BeckySharp commented 3 years ago

FYI @JerryZeyu @ZhengTang1120 @reynoldsm88

BeckySharp commented 3 years ago

@reynoldsm88 -- the case class here (CdrDocument, which in turn uses the case class SentenceScore) is what you'll be targeting for the mapping you'll do on your end.

I tried to make the SentenceScore definition match closely with what you already have in the CDR but IDK what you create when you instantiate the CDR object, so if there are things that I can adjust in the definition to make the mapping more trivial, please let me know.

BeckySharp commented 3 years ago

@johnhungerford

BeckySharp commented 3 years ago

I think I fixed the merge conflict right, but sorry if not

kwalcock commented 3 years ago

Thank you, @BeckySharp!

BeckySharp commented 3 years ago

ok, modified the case classes bc @johnhungerford indicated they already slice up the doc into sentences, so we can use that and avoid offset alignment sadness.

re: unused stuff, yes, happened in the merge conflict, fixed re: idx etc and other formats, how much of this do you want me to address vs a student?

BeckySharp commented 3 years ago

note @MihaiSurdeanu that i commented out the tests bc the input format changed a lot, so they will need to change too

bethard commented 3 years ago

@BeckySharp The idx thing was something that was not in the previous code, so presumably you're the best one to decide if you need it or not. I'm not sure what "and other formats" refers to. Was it the more efficient rewrite of your filtering by threshold code?

BeckySharp commented 3 years ago

s something that was not in the previous code, so presumably you're the best one to decide if you need it or not. I'm not sure what "and other formats" refers to. Was it the mo

it was there when i grabbed the branch, but maybe someone had already removed it when you looked (which would also explain the merge conflicts even though I started with the wg_1 branch). I really have no idea why it was there, but again, I don't need it so anyone can remove it

BeckySharp commented 3 years ago

and other formats

IDK what of your review still applies now that I re-write things with the diff input. But, if you still want things done differently/more efficiently, please let me/ @JerryZeyu / @ZhengTang1120 know.

bethard commented 3 years ago

@JerryZeyu I've restored the tests, so you don't need to do that. But it would be good if you could try to add an additional test of @BeckySharp's feature where some of the input sentences are filtered if they don't have a high enough score. That would probably mean changing texts from a Seq[Seq[String]] to a Seq[Seq[(String, Double)]] and then making up scores for each sentence. (It wouldn't matter what scores you make up, as long as you test that the resulting concepts are appropriately filtered.)

bethard commented 3 years ago

I think this can be merged. The extra test Zeyu should write can go into the wg1 branch after the merge.

BeckySharp commented 3 years ago

I think this can be merged. The extra test Zeyu should write can go into the wg1 branch after the merge.

thanks @bethard !