Closed BeckySharp closed 3 years ago
@reynoldsm88 -- the case class here (CdrDocument
, which in turn uses the case class SentenceScore
) is what you'll be targeting for the mapping you'll do on your end.
I tried to make the SentenceScore
definition match closely with what you already have in the CDR but IDK what you create when you instantiate the CDR object, so if there are things that I can adjust in the definition to make the mapping more trivial, please let me know.
@johnhungerford
I think I fixed the merge conflict right, but sorry if not
Thank you, @BeckySharp!
ok, modified the case classes bc @johnhungerford indicated they already slice up the doc into sentences, so we can use that and avoid offset alignment sadness.
re: unused stuff, yes, happened in the merge conflict, fixed re: idx etc and other formats, how much of this do you want me to address vs a student?
note @MihaiSurdeanu that i commented out the tests bc the input format changed a lot, so they will need to change too
@BeckySharp The idx
thing was something that was not in the previous code, so presumably you're the best one to decide if you need it or not. I'm not sure what "and other formats" refers to. Was it the more efficient rewrite of your filtering by threshold code?
s something that was not in the previous code, so presumably you're the best one to decide if you need it or not. I'm not sure what "and other formats" refers to. Was it the mo
it was there when i grabbed the branch, but maybe someone had already removed it when you looked (which would also explain the merge conflicts even though I started with the wg_1
branch). I really have no idea why it was there, but again, I don't need it so anyone can remove it
and other formats
IDK what of your review still applies now that I re-write things with the diff input. But, if you still want things done differently/more efficiently, please let me/ @JerryZeyu / @ZhengTang1120 know.
@JerryZeyu I've restored the tests, so you don't need to do that. But it would be good if you could try to add an additional test of @BeckySharp's feature where some of the input sentences are filtered if they don't have a high enough score. That would probably mean changing texts
from a Seq[Seq[String]]
to a Seq[Seq[(String, Double)]]
and then making up scores for each sentence. (It wouldn't matter what scores you make up, as long as you test that the resulting concepts are appropriately filtered.)
I think this can be merged. The extra test Zeyu should write can go into the wg1 branch after the merge.
I think this can be merged. The extra test Zeyu should write can go into the wg1 branch after the merge.
thanks @bethard !
FYI @JerryZeyu @ZhengTang1120 @reynoldsm88