Closed HarryMcCarney closed 1 year ago
I will have a look at this. Maybe there is a performance advantage if you explicitly restrict it to float. If so, there should be additional "generic" functions. I'll test it and make the functions usable for "non-float" lists as well.
That you don't have access to non-float letters in your case is hard to work around in the module. There are a lot of possible alphabets that could be considered (upper case, lower case, äüö, special characters, numbers). I assume you have to add your desired set of characters separately by:
let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()
With this at hand you can use this as template and just replace counts of characters that are existing in your text.
#r "nuget: FSharp.Stats"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()
let myTextMap =
"mississippi".ToCharArray()
|> List.ofArray
|> Frequency.createGeneric
let myFinalMap =
// use your own defined alphabet to include the desired set of characters
myAlphabet
|> Array.map (fun key ->
// if the text contains the current character, its value is used
if myTextMap.ContainsKey key then
key,myTextMap.[key]
// if the text does NOT contain the current character, set its count to 0
else
key,0
)
|> Map.ofArray
// accession of character frequencies
myFinalMap.['z'] // 0
myFinalMap.['s'] // 4
// visualization
myFinalMap
|> Map.toArray
|> Chart.Column
|> Chart.withSize (1000.,500.) // quick way to depict all characters
|> Chart.show
I'll comment if I have any news.
I fixed the issue, tested the Empirical.create
function, and added a convenience layer for nominal/categorical inputs.
32fa0c23f2629dd9c149b4d98bc9c0befea86ad2
060f696a9e8f8bad7542bf35bb5ba885f560d574
7c1242dbe65710142e70e3c823bb46afeacafffd
still missing
You can build the binaries yourself or wait for the next FSharp.Stats release.
(Update: You can use #r "nuget: FSharp.Stats, 0.4.12-preview.1"
)
Define the set of characters to search for:
#r @"<PathToFSharp.Stats>\FSharp.Stats\src\FSharp.Stats\bin\Release\netstandard2.0\FSharp.Stats.dll"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let letters = "Mississippi"
// Define your set of characters that should be checked for
// Any character that is not present in these sets is ignored
let myAlphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" |> Set.ofSeq
let mySmallAlphabet = "abcdefghijklmnopqrstuvwxyz" |> Set.ofSeq
These alphabets can be used to create the probability maps.
//takes the characters and determines their probabilities without considering non-existing characters
let myFrequencies0 = EmpiricalDistribution.createNominal() letters
//takes upper and lower case characters and determines their probability
let myFrequencies1 = EmpiricalDistribution.createNominal(Template=myAlphabet) letters
//takes only lower case characters and determines their probability
let myFrequencies2 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet) letters
An additional field for transforming the input sequence may be beneficial if it does not matter if an character is lower case or upper case:
//converts all characters to lower case characters and determines their probability
let myFrequencies3 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet,Transform=System.Char.ToLower) letters
// check probability of non existing characters, that are within the search scope (Template alphabet)
myFrequencies3.['z'] //returns 0.0
[
Chart.Column(myFrequencies0 |> Map.toArray,"noTemplate") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies1 |> Map.toArray,"bigAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies2 |> Map.toArray,"smallAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies3 |> Map.toArray,"toLower + smallAlphabet") |> Chart.withYAxisStyle "probability"
]
|> Chart.Grid(4,1)
|> Chart.withTemplate ChartTemplates.lightMirrored
|> Chart.withTitle letters
|> Chart.withSize(1000.,900.)
|> Chart.show
A prerelease is published and can be used:
#r "nuget: FSharp.Stats, 0.4.12-preview.1"
The documentation that contains the same information as this thread can be found here.
Thanks Benedikt, nice solution!
I can create this
But then cant get probability for specific value as all functions except ofHistogram take a float as the map key. I can work around this by querying the map directly with letters["i"]. But then letters["z"] returns an error instead of a zero.
Would prefer to use probabilityAt but this expects Map<float,float>. Should this function be generic or have I missed something?
thanks