Open thomasd3 opened 4 months ago
@luisquintanilla you are most familiar with F#, any ideas?
Okay. I think I was able to validate.
> #r "nuget:Microsoft.ML";;
[Loading C:\Users\luquinta\.packagemanagement\nuget\Projects\12360--b816f9ce-1002-44bf-bd7f-bd9f8ee041a6\Project.fsproj.fsx]
module FSI_0002.Project.fsproj
> open Microsoft.ML;;
> let data = [[|1f;2f;3f|];[|4f;5f;6f|]];;
val data: float32 array list = [[|1.0f; 2.0f; 3.0f|]; [|4.0f; 5.0f; 6.0f|]]
> let ctx = new MLContext();;
val ctx: MLContext
> let dv = ctx.Data.LoadFromEnumerable data;;
System.ArgumentOutOfRangeException: Could not determine an IDataView type and registered custom types for member SyncRoot (Parameter 'rawType')
at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(String name, Type rawType, IEnumerable`1 attributes, Boolean& isVector, Type& itemType)
at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(MemberInfo memberInfo, Boolean& isVector, Type& itemType)
at Microsoft.ML.Data.SchemaDefinition.Create(Type userType, Direction direction)
at Microsoft.ML.Data.InternalSchemaDefinition.Create(Type userType, Direction direction)
at Microsoft.ML.Data.DataViewConstructionUtils.CreateFromEnumerable[TRow](IHostEnvironment env, IEnumerable`1 data, SchemaDefinition schemaDefinition)
at Microsoft.ML.DataOperationsCatalog.LoadFromEnumerable[TRow](IEnumerable`1 data, SchemaDefinition schemaDefinition) at <StartupCode$FSI_0006>.$FSI_0006.main@() in C:\Users\luquinta\stdin:line 5
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
Stopped due to error
The reason this is might be happening is, ML.NET uses the property name / member as the column name.
If you're using anonymous types, you can do this and it works:
> let dataB = [{|x=[|1f;2f;3f|]|};{|x=[|4f;5f;6f|]|}];;
val dataB: {| x: float32 array |} list =
[{ x = [|1.0f; 2.0f; 3.0f|] }; { x = [|4.0f; 5.0f; 6.0f|] }]
> let dv = ctx.Data.LoadFromEnumerable dataB;;
val dv: IDataView
Your method works as well @thomasd3 if you're using records. However, you could use this overload so you don't have to specify all the columns.
Alternatively, you could also use slicing notation for LoadColumn
LoadColumn(cols[|1..|]
@luisquintanilla, I assemble several float32 arrays into one and depending on the model being trained, I have different lengths (the code is quite generic).
In this case:
type DataRow =
{
[<ColumnName "Label"; LoadColumn(0)>]
Label: bool
[<ColumnName "Features"; LoadColumn(Features[|1..|])
Features: float32 array
}
how can I set up LoadColumn for a variable number of columns?
Additionally, I tried this:
let aa =
request.Data
|> Json.deserialize<float32 array list>
|> List.map (fun x -> {| Label = x[0]; Features = x[1..] |})
let loadedData = context.Data.LoadFromEnumerable aa
so, it will load the data properly here; but then, on the training code:
// shuffle the data
let shuffledData = context.Data.ShuffleRows (loadedData)
// cache the data
let data = context.Data.Cache(shuffledData)
// define the pipeline
let settings = BinaryExperimentSettings()
settings.MaxExperimentTimeInSeconds <- uint request.TimeAllowed.TotalSeconds
// create the experiment
let experiment = context.Auto().CreateBinaryClassificationExperiment(settings)
// train the model
let result = experiment.Execute (data, labelColumnName = "Label")
but then I get this error:
Schema mismatch for feature column 'Features': expected Vector
, got VarVector (Parameter 'inputSchema')
If I try to make the Vector of fixed length, using:
|> List.map (fun x -> {| Label = x[0]; Features = Vector<float32>(x[1..]) |})
then I get:
Could not determine an IDataView type and registered custom types for member Features (Parameter 'rawType')
when loading the data.
I think this is happening because you need to add the VectorType
attribute to your Features column. In it, you specify the number of columns.
type DataRow =
{
[<ColumnName "Label"; LoadColumn(0)>]
Label: bool
[<ColumnName "Features"; LoadColumn(Features[|1..|];VectorType(20))
Features: float32 array
}
In this case, I put 20, but you could set that to however many columns are your features.
In this case, it won't compile:
In this case, it won't compile:
![]()
What's Features
in your example? Is it an array containing the indices you want to load?
It’s a float32 array with all the training data for that row
It’s a float32 array with all the training data for that row
Okay. I think that might be the issue. Those need to be an array of row indices, not the data itself.
You can also use the overload LoadColumn(int start, int end)
. Where start is the index of the column your data begins, and end is the index of the last column you want to read.
Ah yes, that compiles! I didn't know that overload was there. Thanks a lot!
As a quick side question, can I also load a model from memory? I'm currently loading it from a db (I've stored the zip file) and I need to save it on disk to load it:
let loadModelAsync (postgresConnectionString: string) (coreName: string) (coreVersion: int) (ticker: Ticker) (intervals: string) (modelName: string) =
asyncResultOption {
// get the model from the database
let! model = MLModelDatabase.getModelAsync postgresConnectionString coreName coreVersion ticker intervals modelName
// build a filename
let modelDataFilename = nanoGuid()
// write the model to a file
do! File.WriteAllBytesAsync(modelDataFilename, model.Model)
// create the MLContext
let context = MLContext()
// load the model
let model, _ = context.Model.Load modelDataFilename
// erase the file
File.Delete modelDataFilename
// return the model
return model
}
Which is a similar problem to when I was doing the training.
I guess the 'LoadWithDataLoader' is probably what is worth investigating:
Using a BinaryClassifier, with AutoML in F#, I have my data structured at this type:
So, each row is a float32 array and I've a list of rows. The first column is the label, all other columns are the features
Using context.Data.LoadFromEnumerable () does not work on this data type. While the list implements IEnumerable, I can't use that function.
For now, I'm using something very ugly: I write the data to a csv and I load it from disk:
the row is defined as:
which makes very little sense...
and the loading code is even worse:
how can I use LoadFromEnumerable from a list to avoid this?