HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.96k stars 536 forks source link

We should output "supervised" data points as well #416

Closed alldefector closed 8 years ago

alldefector commented 9 years ago

Looks like data points with filled is_correct values get excluded from Gibbs sampling. But the expectations on those data points are useful for both dev process and the final data product.

Right now, people would have to cumbersomely make copies of those supervised data points to force DD to run inference on them (@ajratner @zhangce @raphaelhoffmann for example). I think I've brought this up a while ago.

Do you agree that we should always include supervised data points in inference? I realize that we may also need to change code of the sampler for this. Would that be a lot of work?

zhangce commented 9 years ago

@thodrek 's new framework should provide an ultimate cleaner solution here. For short term, it should also be an easy fix.

zifeishan commented 9 years ago

Agreed that this is very useful in many applications.

On Sun, Sep 27, 2015 at 2:33 AM, zhangce notifications@github.com wrote:

@thodrek https://github.com/thodrek 's new framework should provide an ultimate cleaner solution here. For short term, it should also be an easy fix.

— Reply to this email directly or view it on GitHub https://github.com/HazyResearch/deepdive/issues/416#issuecomment-143507063 .

Zifei Shan M.S. in Computer Science, Stanford University (2015)

raphaelhoffmann commented 9 years ago

@zhangce @SenWu @feiranwang Can you show us a few lines of ddlog code that does this copying of supervised examples? Or should we already do it inside our UDFs?

feiranwang commented 9 years ago

@raphaelhoffmann This can be achieved by turning on the sampler argument --sample_evidence, which will let the sampler output probabilities for evidence variables as well. The sampler argument could be added in deepdive.conf, and when the app.ddlog is compiled by deepdive run, this option will get incorporated.

zhangce commented 9 years ago

@feiranwang Thanks, Feiran! Yeah, I forget you added this before...

ajratner commented 9 years ago

Yeah sorry for not chiming in earlier, we ran into this problem in genomics and @feiranwang added the --sample_evidence flag for us pre-ddlog, so we had been using it

Confirming that there is a simple way to incorporate this with ddlog would be great too!

raphaelhoffmann commented 9 years ago

Thanks! That works! @alldefector I think we can close this issue.

alldefector commented 9 years ago

Hi @feiranwang , did you mean we can specify --sample_evidence in app.ddlog or the deepdive run command line? I wasn't able to find out how...

In light of people's confusion regarding this design choice, what do you guys think if we turn the flag on by default?

raphaelhoffmann commented 9 years ago

+1. Turning it on by default (and supporting switch --disable_evidence_sampling) would also allow us to avoid having to edit deepdive.conf.

feiranwang commented 9 years ago

@alldefector For ddlog application, according to this doc, there can also be a deepdive.conf, which contains extra configuration (sampler args, etc.). For example, deepdive.conf may look like

deepdive.sampler.sampler_args: "-l 200 -s 1 -i 500 --alpha 0.1 --sample_evidence"
chrismre commented 9 years ago

This should not be on by default.

On Sun, Sep 27, 2015 at 9:00 PM Feiran Wang notifications@github.com wrote:

@alldefector https://github.com/alldefector For ddlog application, according to this doc http://deepdive.stanford.edu/doc/advanced/deepdiveapp.html#structure, there can also be a deepdive.conf, which contains extra configuration (sampler args, etc.). For example, deepdive.conf may look like

deepdive.sampler.sampler_args: "-l 200 -s 1 -i 500 --alpha 0.1 --sample_evidence"

— Reply to this email directly or view it on GitHub https://github.com/HazyResearch/deepdive/issues/416#issuecomment-143636119 .

xiaoling commented 9 years ago

Yeah, I was completely confused by why MindBender didn't include the supervision examples...

zhangce commented 9 years ago

@feiranwang @raphaelhoffmann @alldefector Following is my opinion on this discussion... tl;dr I think it is the user's responsibility to create copies or add the flag to sample evidences (I actually don't think we should ever have a flag to sample evidences, but that is a separate issue).

There are couple ways to deal with supervision data:

  1. Assume it is users' responsibility to create copies if they need to create copies;
  2. Directly include it into the inference result without running inference;
  3. Automatically run inference over copies and include them into the inference result.

1 is what we are doing.

2 is obviously wrong.

3 looks good at first glance because it saves you one query or flag. However, if 3 is not done correctly, it is very dangerous because it leads to very wrong experimental design by not decoupling training and testing. How to take advantage of supervised evidence but still ensure the experiment result is not cheating for looking at training example is tricky and domain-specific, and I don't think we are ready to define the default here without confusing more users (and give them wrong numbers).

3 looks easy for the programs that mainly contain regressions. But what is 3's formal semantic in general factor graph? When you have many general correlations between variables, how should these copies been connected with other variables? Should these copies be connected to copies of other variables? Should these copies be connected to the evidences of other variables? I think these decision should be made by the user by creating copies if they know what they are doing.

feiranwang commented 9 years ago

Just to add a point for 1, the users don't need to create copies of training data, they just need to turn the sampler flag on. Then, the sampler will sample evidence variables during inference.

raphaelhoffmann commented 9 years ago

@zhangce I agree that there are different things you might want to do with DeepDive, and in some cases you do not want to sample evidence variables.

When building extractors with noisy distant supervision rules, however, you likely do want to sample evidence variables. We may at least want to emphasize this setting in the tutorials and examples since some users may not be aware.

zhangce commented 9 years ago

We may at least want to emphasize this setting in the tutorials and examples since some users may not be aware.

@raphaelhoffmann I agree that we should have a tutorial to tell users about different ways of treating distant supervision rules.

My comment was only about the topic that whether sample_evidences should be a default setting or not, which I think we should not set it as default.

thodrek commented 9 years ago

@raphaelhoffmann We are actually developing a formal framework that will address exactly this issue of noisy "evidence" from distant supervision rules. Maybe we can talk more in person. Indeed sampling evidence variables is interesting in that context. As a new deepdive user myself I've already spent sometime playing with the -sample_evidence flag and I would like to second the argument that sample_evidences shouldn't be set to true by default. As @zhangce pointed out earlier it may lead to weird results.

ajratner commented 9 years ago

3 looks easy for the programs that mainly contain regressions. But what is 3's formal semantic in general factor graph? When you have many general correlations between variables, how should these copies been connected with other variables?

@zhangce I think one of the big benefits of having the --sample-evidence flag is that users don't have to create copies, and we don't have to consider these issues. In fact in the genomics code, before I was involved, I actually think there were some confusions due to having created copies- which is part of why I asked Feiran about making the sample evidence flag in the first place...

In general, I actually think it's fairly simple to keep things 'clean'- you just make sure that any e.g. precision numbers you report are only based on non-evidence variables; that way you get the benefit of potentially "flipping" one of the (noisy!) evidence variables to a correct value, but otherwise nothing else is really changed at all (since the evidence is still pinned in learning...)

@thodrek You literally just beat me to it :) My comment to you was going to be this (echoing @raphaelhoffmann 's point): clearly it seems like the noisier we suspect our evidence to be = the less accurate we think our distant supervision are, the more we would want to re-sample these variables after learning... this is what definitely makes this domain / application-specific (so sure, don't set the flag as true by default), but seems like we can model this in our framework for DSRs, and potentially set the --sample-evidence flag automatically, even differently for different sets of evidence variables coming from different DSRs of different noise levels according to the model... :)

alldefector commented 9 years ago

Looks like there are two issues:

  1. Do we run inference on supervised data points or not? How?
  2. Should we give the user an output table that contains all extractions regardless of whether they are supervised or predicted?

I suppose everyone would agree that we say yes to question 2. And 2 is the source of lots of confusions; e.g., wondering why there is data loss, wondering why MindBender doesn't index all the data, and wondering why many DD programs make strange "unsupervised copies" of "supervised" tuples.

A regular user of DD doesn't have a good understanding of "evidence variables" or how Gibbs sampling works. To them, the input to DD is some source data and some seed dictionaries or patterns; the output should be all extractions / predictions. It may well be the case that some of the "predictions" are (supervision) rule-based, they are still predictions nevertheless. With the upcoming "data programming" framework, I've heard that those rules would start to have weights and become not that black and white -- by then, I suppose it'll become natural to have a probability for "supervised" predictions as well.

zhangce commented 9 years ago
  1. Should we give the user an output table that contains all extractions regardless of whether they are supervised or predicted? I suppose everyone would agree that we say yes to question 2.

With @thodrek's principled framework on noisy distant supervision, I would say yes.

If we are talking about the current version of DeepDive, and the question is whether we provide that table by default, I would say no to this question. I feel like outputting supervised labels to the user by default as extractions might bring in more confusions , and cause questions about the training/testing splitting on any numbers they get and we reported. This is another line of confusion that we can currently avoid.

I do agree that we should have a tutorial about this and teach the user how to do whatever they decide to do by themselves, but by default, I think DeepDive should not provide that output table that samples evidence variables unless the user explicitly forced the system to do so.

netj commented 8 years ago

Another incident of confusion here: HazyResearch/dd-genomics#261. Apparently, not having --sample-evidence on by default confuses even the most sophisticated users of DeepDive. I'd say keeping this off by default is a terrible design, and we should rather communicate the important distinction between training/testing in a more explicit way, e.g., by providing extra views to inference results for all candidates while keeping the current views intact.

alldefector commented 8 years ago

@netj Looks like this has been resolved by the above PR in 0.8. Feel free to reopen if that's not the case.