Closed isaac-gs closed 8 years ago
I think the hub should have initial data modifications as there might be some preprocessing that needs to be done. In addition to what @ASAAR has mentioned, another thing the hub could handle is defaults for empty data
Yeah, I can agree with that. We'll just need to remain careful. Do we also want to split the data on the Hub or no? It would mean that every model is getting the same split. I don't know if that's a good or bad thing but we could always change it latter (as long as we make it modular enough).
@RyanMcBerg thoughts? whenever i see data splitting I think of you.
Proposed data prep responsibilities: (From 10/30 meeting) ( In order they should be handled)
Hub:
Worker:
Hey everyone, so for the most part we've decided that data cleaning and prep tasks should go in the DMZ for each worker to use. However there is one more issue that I'd like to talk about.
What about basic cleaning and splitting of the data for training? Did we cover this? On the one hand, we don't want to repeat work. On the other, we don't want to take too much away from worker flexibility.
Example,
One last thing. If we have workers do their own data prep and cleaning, should we also have it so they can request tickets with specific models or data configurations for less repeated work? I realize that is a long-term concern, but it should be considered.
Thoughts? @asclines @RyanMcBerg @ZakeryFyke @telelu03