Closed yaeldekel closed 5 years ago
Hi @yaeldekel, this this seems like a good first step. We will need at least what you've proposed here I think, so adding these would certainly not be harmful. I'd also add that saving the input schema itself in the case where the loaded data is, say, programmatically defined may also be necessary. (Sometimes you aren't loading using a loader at all, but this does not mean preserving the input schema is any less important.)
There's also a few more interesting things. You'll note the presence of this IDataReader<TSource> reader
. Yet, IDataReader<in TSource>
does not depend on ICanSaveModel
(we made transformers descend from it, per @artidoro's #2431. It seems like it probably should.
It also seems to me that the presence of this <in TSource>
is an interesting wrinkle; back when we had pipelines based on the old IDataLoader
, there was no confusion about what the type of that was -- always an IMultiStreamSource
. Yet in the new world, it could be practically anything. Or it could be nothing at all, but we might still want to be able to load the schema. (But this seems to me to have devolved into the dual of the point I made in the first paragraph. It seems like we'll need a way to load a model file and get only what had been the GetOutputSchema()
on the reader if specified, or the data view schema if that was what was saved with the pipeline.)
Anyway: think the work you've proposed is a positive first step, and I think we should give it a shot, But it seems to me we need to develop this idea more fully. Those are just the most obvious holes in the idea I see right off the bat, there may be more, or solutions might become more obvious once we start practically working on it, as I find is often the case.
If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.
Do you view this as something that cannot be fixed after v1?
If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.
Do you view this as something that cannot be fixed after v1?
I consider the APIs need to change, since they are saving "incomplete" models. So I'd like to remove and rework the APIs in their current form, since they are leading people into "pits of failure."
Reasons for this are listed in issue #2663.
Currently, ModelOperationsCatalog offers the following API:
So when using a loaded model, users have to create the
IDataView
to be passed to theITransformer
themselves by creating a newTextLoader
, (or another way?). I suggest adding these new APIs to ModelOperationsCatalog:The last one would return a
CompositeDataReader
containing the loader and theITransformer
chain, so we could also add new APIs toDataOperationsCatalog
to only load the reader:Another option is to add an API that creates a
PredictionEngine
from aStream
, or an API that creates aSchemaDefinition
from aStream
(that way users can use the existing API to create aPredictionEngine
).@TomFinley, what do you think?