Open akprasad opened 6 months ago
How about changing the classify
method entirely into a classify_vrtta
method that returns a single result (which now also has a quality parameter). We can have an analogous classify_jati
method for Jati metres.
pub fn classify_vrtta(&self, text: impl AsRef<str>) -> MatchResult {}
pub fn classify_jati(&self, text: impl AsRef<str>) -> MatchResult {}
classify_all
can then just return the best n results of Partial Matches by updating them in some data structure while iterating:
pub fn classify_all(&self, text: impl AsRef<str>, best_n: usize) -> Vec<MatchResult> {}
pub struct MatchResult {
matched: Option<Matched>,
match_type: Vec<MatchType>,
aksharas: Vec<Vec<Akshara>>,
}
pub enum Matched {
Vrtta(Vrtta),
Jati(Jati),
}
pub enum MatchType {
Partial(usize), // Representing Quality of match
Prefix,
Pada,
Full,
}
@akprasad
Thanks for the suggestions!
Having both classify_vrtta
and classify_jati
feels too complex -- an end user won't necessarily know whether the meter is a vrtta or jAti, and we should make their life easier with a simple classify
endpoint so that they don't have to make two calls.
best_n
is a nice idea but puts us in the position of having to decide what "best" means, but this seems dependent on the use case. classify_all
without best_n
is less sophisticated, but the return data is also more obvious and deterministic.
I do like the idea of putting usize
and other fields on MatchType
, e.g. Prefix(usize)
to store the length of the prefix match. Partial
is unclear to me. Perhaps MatchType
should be split into a Scope
(prefix, pada, full) and a Type
(exact, EditDistance(usize), ...)
Having both
classify_vrtta
andclassify_jati
feels too complex -- an end user won't necessarily know whether the meter is a vrtta or jAti, and we should make their life easier with a simpleclassify
endpoint so that they don't have to make two calls.
Can't classify_all
to be that endpoint? best_n
can be defaulted to 1 (and first the Jati match is checked before vrtta match)?
best_n
is a nice idea but puts us in the position of having to decide what "best" means, but this seems dependent on the use case.classify_all
withoutbest_n
is less sophisticated, but the return data is also more obvious and deterministic.
I felt when the database consists of several hundreds of schemes, it would be apt to ask for the best 5 or so than a match against every single scheme in the data. (If we're providing % matches it might match against many meters)
best_n can be defaulted to 1
Sadly Rust doesn't support default arguments, so there's no way to both have a default value and reduce friction for the user. (There are workarounds like having an Option<usize>
, but then the caller needs to pass None
explicitly.)
when the database consists of several hundreds of schemes
Good point!
So perhaps:
pub fn classify(&self, text: impl AsRef<str>) -> Matches {
self.classify_all(text.as_ref(), 1)
}
pub fn classify_all(&self, text: impl AsRef<str>, n: usize) -> Matches { ... }
pub struct Match {
// TODO: vrtta/jati, match type, etc.
}
pub struct Matches {
matches: Vec<Match>,
aksharas: Vec<Vec<Akshara>>,
}
pub fn classify(&self, text: impl AsRef
) -> Matches { self.classify_all(text.as_ref(), 1) }
What kind of response struct should be returned? If classify is just returning one response, should it be the one giving Pada match / Prefix Match / None(edit_distance or any other metric) in case there is not Full match?
How about something like:
pub enum Padya {
Vrtta(Vrtta),
Jati(Jati),
}
// I don't like this name
pub enum MatchStrategy {
Exact,
EditDistance(usize)
}
pub enum MatchScope {
None,
Prefix,
Pada,
Full,
}
pub struct Match {
pada: Padya,
strategy: MatchStrategy,
scope: MatchScope,
}
pub struct Matches {
matches: Vec<Match>,
aksharas: Vec<Vec<Akshara>>,
}
Then both methods can return a Matches
. classify
can return a Matches
where the matches
field has length at most 1.
Problem
Chandas
contains aclassify
method that returns a single result. But sometimes it is interesting to look at all possible results and let the user decide how to use them.Proposed solution
Create a
classify_all
method that returns all padyas (vrttas + jatis) that don't have aNone
match.