Open fwaris opened 2 years ago
I just found out that one workaround for the above issue is to set "ignoreMissingColumns=true" when calling CreateEnumerable(...)
Glad you found a workaround, but yeah that isn't the behavior we want I don't think.
When did you notice this change? We haven't made a recent changes to SchemaDefinition.Create
that I am aware of. I wonder if it has to do with a .NET version change? What version are you using now? Could you also try an older version?
I think the change was F# tooling rather than ML.Net (rethinking the issue). Some newer version of F# (maybe 5.x) changed the IL generation for mutable records, surfacing this issue.
Totally agree that only public fields should be considered for serialization/deserialization.
Note also that, as-is, when creating a DataView (LoadFromEnumerable…) , the xxx@ fields also get added into the dataset, bloating its size.
Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows
From: Michael @.> Sent: Monday, June 13, 2022 1:51 PM To: @.> Cc: @.>; @.> Subject: Re: [dotnet/machinelearning] SchemaDefinition.Create picks up internal fields instead of just public only - affects F# (Issue #6209)
Glad you found a workaround, but yeah that isn't the behavior we want I don't think.
When did you notice this change? We haven't made a recent changes to SchemaDefinition.Create that I am aware of. I wonder if it has to do with a .NET version change? What version are you using now? Could you also try an older version?
— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/machinelearning/issues/6209#issuecomment-1154210278, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACGZMFVW53NGRAHXS6NG2TVO5YKFANCNFSM5XKT6S2Q. You are receiving this because you authored the thread.Message ID: @.***>
Since this is a change by the F# tooling there is nothing that we can do about it unfortunately. Closing this for now, but if things change in the future please create a new issue and we will take another look.
@michaelgsharp Just to check....
There are some situations where F# needs to emit public fields - notably when emitting code into multiple assemblies in F# Interactive.
My understanding is that serialization of fields should ignore
This is what System.Json and Newtonsoft.Json both do. Does SchemaDefinition
do that? It would be great if it does.
See also https://github.com/dotnet/fsharp/pull/13494 which is related
@fwaris Do you know if you hit this issue when using F# Interactive?
Thanks Don
@dsyme let me reopen the issue for now while I look into that. I am not actually sure off of the top of my head.
Just in case, it is still an issue as all ML.NET guides advice to use records and they are breaking in F# interactive environment.
One have to know about hacky workaround passing --multiemit-
to FSI which is not a common knowledge
Check out MLUtils that has some utility code to help use ML.Net from F#. You can use cleanSchema to remove '@' fields from the default schema
#r "nuget: MLUtils"
open MLUtils
type NRow = { Field1:int; ...}
let xs : NRow list = []
let dv = ctx.Data.LoadFromEnumerable(xs,Schema.cleanSchema typeof<NRow>)
Going the other way - from IDataView to F# record - ignoreMissingColumns flag works.
ctx.Data.CreateEnumerable<NRow>(dv,false,ignoreMissingColumns=true)
@michaelgsharp Just to check....
There are some situations where F# needs to emit public fields - notably when emitting code into multiple assemblies in F# Interactive.
My understanding is that serialization of fields should ignore
- internal fields
- private fields
- public fields marked "CompilerGenerated".
This is what System.Json and Newtonsoft.Json both do. Does
SchemaDefinition
do that? It would be great if it does.See also dotnet/fsharp#13494 which is related
@fwaris Do you know if you hit this issue when using F# Interactive?
Thanks Don
@dsyme, sorry, did not see this until now. There is a workaround. See the last few messages in this conversation for some notes on the MLUtils package.
This used to work before but now I cannot use CreateFromEnumerable in F# now.
In F#, we define mutable classes by annotating F# 'record' types with CLIMutable:
F# compiler generates IL that looks as follows:
The SchemaDefinition.Create method picks up both "Data@" and "Data" as fields required by the schema. It should only pickup the public fields.
Please add unit tests for F# as well so these issues are caught earlier.
@dsyme