Closed dmitra79 closed 4 years ago
Hi, great question! Our code path for training without label dependencies is much more optimized than training with label dependencies (the moments are easier to calculate), so that's why you're seeing the runtime gap, and there isn't a simple way to parallelize training in the current implementation.
Have you evaluated the performance (in terms of accuracy or F1) of the label model with and without label dependencies? We've often found that performance is often still ok without dependencies in practice, especially when you don't have too many labeling functions (like 9).
Thank you for the reply! Unfortunately we have few ground truth labeled to evaluate the performance, but we'll try it.
Hello,
I tried training on 100K records with 9 weak labels: training takes 0.02 seconds without lambda_edges, but 7s with 1 edge, 18s with 2 edges and 21s with 3 lambda edges. Is this expected behavior? Are there ways to speed it up or parallelize? (I have multiple datasets with 47M rows, so assuming linear scaling in records, it'd take almost 3h for training on each...)
Thank you!