Remove default label - Githubissues

matt-gardner commented 7 years ago

As per #265. I also took care of some TODOs in the dataset code while I was at it, just cleaning up the dataset API a bit. This looks a lot bigger than it really is, because I cut-and-pasted three methods, which makes up the majority of the line changes in this PR.

DeNeutoy commented 7 years ago

LGTM. Also, we need to be a bit more rigorous about assigning ourselves to issues, because i'd also almost finished this separately (just as much my fault too though, as I should have assigned myself when I started working on it on Friday). I was going for a slightly larger change which pins the format of instances to include labels and introduces a new instance for Evidence, which don't have labels (e.g for background in Background Instances), but this looks pretty good. I also just realised that my new background type requires the instance to be just a sentence, which wouldn't support tuples, so keeping background instances as actual instances is probably better.

matt-gardner commented 7 years ago

Sorry, I was looking over the issues for something quick to do last night, and this one looked easy enough. Yeah, if you start working on something, be sure to grab the issue, either with a comment or by assigning it.

Trying to understand what you were working on - what do you mean by "pins the format of instances to include labels"? And you were thinking of introducing Evidence as a new abstract superclass, with BackgroundInstance as a subclass? Or replacing BackgroundInstance with something new?

DeNeutoy commented 7 years ago

I was basically trying to make it so that the instances we use for background don't have labels at all, not just that they are None. I'm not sure it was massively useful.

matt-gardner commented 7 years ago

Oh, I get it. Yeah, that's a useful thing to want to do.

I think that our current set up already does this, though (I'm just explaining here, trying to help you understand some engineering decisions, not criticizing what you were doing). In order for BackgroundInstance to be usable in a Dataset, e.g., so we can actually get a label out with as_training_data, it has to have some label associated with it. But it doesn't have a label itself, it uses the label of the Instance that it wraps.

So, we treat BackgroundInstances as wrappers, similar to how BufferedReader wraps a FileReader in java. The other way to do this is to have some kind of side-information, like you were thinking, and change Instance to have a field for optional side information. This would work too, but it would mean that Instance has a field that will probably be empty almost all of the time, adding complexity to the base class for a non-standard case. These both accomplish the same end goal, but I think doing it as a wrapper results in a cleaner API.

DeNeutoy commented 7 years ago

I'd managed to alter both Dataset.as_training_data and TextTrainer.__pad_and_convert_dataset to be agnostic to whether they were being passed training data with or without labels (although this was only useful for the bits in DifferentiableSearch where we need to just get the inputs and not the labels of background, which was why I said I didn't think this change was actually too useful, given the only format which requires no labels is this case). But yeah, thanks.

allenai / deep_qa

Remove default label #325