Closed juliohm closed 4 years ago
Thanks for taking the time to put this together, Anthony is the one who dedicated most thought on measures so he's best placed to answer.
Note that there is the eventual plan of having a separate MLJMeasures
package which would be a good occasion to generally improve the interface; so it's a good time to discuss this and your feedback is welcome!
Thank you @tlienart for the feedback. A separate package would be great 💯 If you feel like adding me to the organization, I could work on the proposal therein already, otherwise I can submit PRs to the repository.
oh it's not there yet so I think here is a good place to discuss what it could look like, thanks for the support!
cc: @ablaom
@juliohm
Thanks for your helpful review of the measure API. I appreciate this takes some time and effort. Thanks also for the offer to help out in an area where I agree there is room for improvement.
My first impression is that your requirements are more specialized than the needs of the general MLJ user. I hope that despite this you will appreciate that, in a broader context, the original goals of the API are generally worthwhile, and you remain willing to contribute. Let me do my best to respond to your post. I'm sorry for not responding to your comments in the same order made.
Additionally, measure implementations are not necessarily ready for automatic differentiation nor they are ready for computation on GPUs.
I agree these are worthwhile goals. It would be helpful if you could provide examples of the shortcomings, thanks.
In another thread you mentioned type-instabilities. It would be likewise helpful if you could flag examples. (I'm more concerned with evaluation of the measures here that with instantiation thereof.)
for example
yhat
should include other objects like distributions, please motivate your claim that loss functions should be the mechanism to compare numbersy
with distributionsyhat
. It is not necessarily clear that a loss function should support this comparison.
...
Am I correct to say that the existence of
target_scitype
andprediction_type
is due to the fact that loss functions currently compare objects of different type?
Yes.
Should it be that way? Is it covering that comparison between
yhat=distribution
andy=number
? My opinion at this moment favors a simple interfaceL(yhat,y)
whereyhat
andy
are scalars of the same scientific type. I understand thatyhat=f(x)
is the output of a learning model withtarget_scitype
andprediction_type
, but propagating this type information seems unnecessarily complex.
The output of probabilistic predictors is varied. The predicted distributions need not be parametric or even have analytic representations (e.g., generated by MCMC). For uniformity of interface, it was decided that probabilistic models in MLJ should always predict a distribution, rather than model/domain specific, ambiguously ordered, probabilities or parameters.
An important class of performance measures for probabilistic predictions are the proper scoring rules. See, e.g., this article. Some of these rules are very general in the sense that one formula, defined in terms of the pdf, defines a loss that can be applied to large families of distribution types simultaneously. An example is the Brier score which applies not just to finite distributions but to any distribution whose pdf is suitably well-behaved. So it is very natural to implement loss functions that operate on a distribution, rather than some representation the provider and consumer must agree upon on a case-by-case basis.
Here "distribution" is a little vague; if it's finite, it should be
UnivariatFinite
, any other parametric distribution should generally
be a Distribution.Distribution
object; it should at least implement
rand
and if possible pdf
.
I understand that limitations in the main ML platforms (scikit-learn,
MLR, etc) around the performance evaluation of probabilistic
predictors is a source of some frustration in the Bayesian /
probabilistic programming community, and consequently a source for
fragmentation between the various paradigms. The package
skpro (in which
yhat
is allowed to be a distribution) is one response to this issue
which has informed MLJ's design. See also this related
article.
At present we do not implement a large number of proper scoring rules but we should like to do so at some point.
So, for our purposes, I don't agree that yhat
should be restricted
to a number.
I like the distinction, represented by the trait prediction_type
which ensures that deterministic measures are always applied to
deterministic observations, while probabilistic measures (e.g.,
cross_entropy) are always applied to probabilistic predictions. It
eliminates confusion and provides extra interface points for the
user. If you really want to apply a deterministic measure to a
probabilistic prediction, specify precisely how you want this to be
done. Are you computing the median? The mode? Or perhaps you are
going to have a weighted mode whose weighting is learned, etc. There
are convenience methods like predict_mode
to deal with common use
cases when evaluating a model.
distribution_type
The distribution_type trait seems to be another trait that is the result of allowing losses between objects yhat and y of different kind. Could you please elaborate on what is the meaning of this trait and how it relates to target_scitype and prediction_type?
The distribution_type
was a late addition and is not currently used
anywhere in the stack. It declares the type of probability
distribution that can be plugged in as yhat
when evaluating the
measure (e.g., UnivariateFinite
for cross_entropy
). It is
missing
if prediction_type
is not :probabilistic
, when it has no
meaning. The trait target_scitype
says nothing about the nature of
the probability distributions predicted by a model. It concerns the
target observations, rather than predictions.
Given the fact the the type of the distribution (e.g., MCMC-generated object) might not be accessible, this trait may not be universally useful. On the other hand, I don't see it does any harm.
For a supervised loss function L, we should be able to perform at least two operations:
Evaluate the loss at a pair (yhat, y) Estimate the expected loss E[L] using a sample of n pairs: E[L] ~ (1/n) * sum(L(yhat_i, y_i)) In the second operation, we can also introduce a weighting function:
Weighted expected loss is given by E[WL] ~ (1/n) sum(w_i * L(yhat_i, y_i))
Yes, I agree that it would be nice if all performance measures in
common use were defined as the mean of a per-observation measure, both
from the theoretical and practical points-of-view. But many entrenched
performance measures (absent from LossFunctions.jl) don't satisfy this
criterion. Examples include rms
and its many cousins, area under the
ROC curve, and F_β scores. (Of course one could use sums of squares
instead of rms
but general users won't want to do this). More benign
examples are things like true_positive
which count instead of
average the per-observation measurements (as they are conventionally
defined). You may criticise the use of these measures on theoeretical
grounds but you surely know they are ubiquitous.
We consequently take a more general point of view than you propose: A measure is a function applied to a sample, and we do not require that it be the aggregate of any function applied to individual observations.
In those cases where a measure applied to the sample can be
recovered by aggregating its applications to the observations in
isolation, one is allowed (and we generally should but don't)
implemement reports_each_observation
as true
, which indicates the
corresponding measure method returns a vector of the per-observation
measurements, instead of a single value. If
reports_each_observation
trait is false
a single value is
expected.
aggregation
Measures that report_each_observation
are aggregated outside of the
measures API and so we require the aggregation
trait to declare how
the per-observation measurements are to be aggregated to obtain the
correct value. Aggregation is not always by mean; rms
and
true_positive
are two of many counter examples. Furthermore, for
any measure, further aggregation occurs in resampling (e.g., CV) when
aggregates from multiple samples are themselves aggregated.
Can you please elaborate on how [
aggregation
] is being used elsewhere in the stack?
When a model's performance is evaluated (using evaluate!
or
evaluate
) one or more performance measures are applied to each
observation in resampling (where you have a collection of train/test
pairs of row indices, as in CV, for example). These per_observation
measurements are aggregated to form a per_fold
measurement (across
the test set) and the per_fold
measurements are in turn aggregated
to obtain an overall measurement
. For measures like auc
, which do
not report_each_observation
, the first step is skipped (and
missing
reported). It is worth noting here that the
per_observation
scores are not discarded after aggregation, as
some tuning strategies (Bayesian) make use of them. The
evaluate!
/evaluate
methods return a named tuple with keys
per_observation
, per_fold
, and measurement
.
If you think it's worthwhile, I would be happy to allow the user to specify an alternative aggregation method at time of instantiation of a measure, with the trait specifying a default value.
orientation
I personally find the
orientation
trait suboptimal. I understand the desire to include multiple concepts (loss, score, etc) on the same umbrella, but we lose expressivity doing so. There will be traits in the future that only make sense whenorientation=:loss
ororientation=:score
. You already know that my vote goes for deprecating this trait, and working on separate concepts for losses, scores, etc. It doesn't mean that we need to have different trait names for these concepts, it just means that we won't be thinking about them as a single generic concept called measures. I would like to be able for example to replaceis_measure
by more specific traits in my user code likeis_loss
oris_score
. Code that consumes losses does not necessarily consumes scores, and vice-versa. So in summary, my suggestion would be to deprecateorientation
, introduceis_loss
,is_score
,etc and finally define a newis_measure(x) = is_loss(x) || is_score(x)
for the generic check.
Sorry, I guess I'm missing some use cases here. For me any loss
function becomes a scoring function if I multiply by minus one, and
vice-versa. I suppose it' common to suppose a loss returns a value
between 0 and 1, with 1 optimal, but I was not aware this was a
universal convention or essentially used anywhere. Can you provide me
with an example of an algorithm that consumes loss functions that
cannot also consume scores by simply multiplying the evaluations by
minus one (after testing orientation
trait`)?
We also want to include functions as "measures" that are neither
losses or scores. One user aleady requested that confusion_matrix
be
admissible in performance evaluation, and this has been
implemented. It's orientation
is :other
which means, for example,
that it cannot be used in hyperparameter optimization.
reports_each_observation
- I understand that the trait
reports_each_observation
tells whether or not a loss is returned for the whole sample or per pair in the sample. This doesn't make much sense to me in the context of loss functions based on the definitions above about expected losses in samples. Can you please elaborate on how this trait is being used elsewhere in MLJ? I see that L1 and L2 losses for example report the values for each observation, but wouldn't it be simpler to just broadcast the equivalent scalar losses? To me thisreports_each_observation
trait could be deprecated as well.
The definition of this trait is given in "What is a loss function?" above.
Several MLJ measures that don't currently report each observation could do so (especially in MLJBase/src/continuous.jl) and I am happy for them to be re-factored.
If a loss function reports_each_observation
, then currently it
implements both a scalar and a vector version which I agree is
sub-optimal. In those cases, I agree it makes sense to require only an
implementation of the scalar case, and to use trait-dispatch to reduce
the vector methods to the scalar case. Of course, when
reports_each_observation
is false, a vector method (only) needs to
be implemented.
From the definition I shared above, every loss function should support weights. The weights are not a property of the loss function itself, but a property of the expectation operator. I would just deprecate
support_weights
and implement the weighting mechanism outside the losses.
Yes, but your definition, as noted earlier, is too restrictive for our purposes.
Here is a proposal: We define supports_weights(m) == reports_each_observation(m) && aggregation(m) <: Union{Sum, Mean}
.
Pros: No need for measures to implement supports_weights
; less code,
more easily maintained. Cons: Considerable refactoring. No way to
specify weights for general measures, such as auc
and F_β-scores.
This proposal presupposes that all measures that can implement
reports_each_observation
indeed do so.
is_feature_dependent
Some problem-specific performance measures depend on the features X
as well as y, yhat
. For example, in
this
data science competition, losses for perishable grocery items are
weighted more heavily that non-perishables (and the weighting is
non-linear). We provide the is_feature_dependent
trait as a
mechanism for communicating that a custom performance measure depends
on X
(so that MLJBase.value(m, yhat, X, y, w)
gets dispatched
properly). See MLJ
docs
for an example of user interaction.
Yes, this trait would be false
for all built-in measures.
is_measure trait
- I understand that
is_measure_type
checks if a type is a measure type. In my opinion, the more useful trait operates on instancesis_measure
. How is the trait on the type being used? Can't we just rename it tois_measure
and cover both cases (type + instance)?
Yes, this is a bit untidy. The is_measure_type
trait is needed for
model inspection. There are two facilities for this:
info(M)
- for returning a named-tuple of all trait values of M
, where
M
is an instance or type.
measures()
(measures(some_boolean_function)
) - for listing all
such named tuples (on which some_boolean_function
is true
), in
the same way that models()
lists all the model metadata
entries. (See MLJBase/src/measures/registry.jl to see details.) A
subtle point is that these methods methods must filter finite
lists of types, because there are generally infinitely many
measure instances (some measures have parameters). So a pure-instance
is_measure
trait seem insufficient here, no?
The is_measure
trait (which can be deduced from the other, of
course) is not used elsewhere in the stack in any essential way.
Our options would appear to be:
keep is_measure_type
and simplify the current code to require
implementation of is_measure_type
only
re-factor to have only the is_measure
trait (acting on instances)
and lose the inspection functionality.
I cant think of a reason to prefer 2 over 1. Why do you say
is_measure` is more useful?
In summary:
A performance measure (such a cross_entropy
) for probabilistic
predictions should, in my opinion, expect yhat
to be
a distribution
All the current traits serve a well-justified purpose, with the
possible exceptions of supports_weights
and distribution_type
The above design points not withstanding, there are opportunities to reduce code and improve implementation of the design which we agree upon
The output of probabilistic predictors is varied. The predicted distributions need not be parametric or even have analytic representations (e.g., generated by MCMC). For uniformity of interface, it was decided that probabilistic models in MLJ should always predict a distribution, rather than model/domain specific, ambiguously ordered, probabilities or parameters.
This is ok, but it is not an argument in favor of the current API for losses.
An important class of performance measures for probabilistic predictions are the proper scoring rules. See, e.g., this article. Some of these rules are very general in the sense that one formula, defined in terms of the pdf, defines a loss that can be applied to large families of distribution types simultaneously. An example is the Brier score which applies not just to finite distributions but to any distribution whose pdf is suitably well-behaved. So it is very natural to implement loss functions that operate on a distribution, rather than some representation the provider and consumer must agree upon on a case-by-case basis.
I disagree with this view. Because scoring rules are something that can be used to track performance, it doesn't mean they fit in the concept of loss as traditionally used.
I understand that limitations in the main ML platforms (scikit-learn, MLR, etc) around the performance evaluation of probabilistic predictors is a source of some frustration in the Bayesian / probabilistic programming community, and consequently a source for fragmentation between the various paradigms. The package skpro (in which
yhat
is allowed to be a distribution) is one response to this issue which has informed MLJ's design. See also this related article.At present we do not implement a large number of proper scoring rules but we should like to do so at some point.
So, for our purposes, I don't agree that
yhat
should be restricted to a number.
In the referred article the authors introduce a new concept called probabilistic loss functionals, which is something different than traditional loss functions, and they make it clear. These should be two separate concepts, and this attempt to make everything fit on the same bag is the issue that I am raising. I am discussing the API of traditional supervised loss functions, and in this case, it doesn't make sense to allow yhat to be a distribution.
I like the distinction, represented by the trait
prediction_type
which ensures that deterministic measures are always applied to deterministic observations, while probabilistic measures (e.g., cross_entropy) are always applied to probabilistic predictions. It eliminates confusion and provides extra interface points for the user.
I disagree. The current interface is confusing for the end user who is not interested in all kinds of performance metrics one can possibly conceive as a "measure". I only wish to eval my models with traditional supervised losses for a paper, and now I have to learn a complex trait to filter out what are the losses, what are the scores, what are the probabilistic functionals, what are the outputs that the model produces, and so on. This is unnecessarily complex.
Yes, I agree that it would be nice if all performance measures in common use were defined as the mean of a per-observation measure, both from the theoretical and practical points-of-view. But many entrenched performance measures (absent from LossFunctions.jl) don't satisfy this criterion. Examples include
rms
and its many cousins, area under the ROC curve, and F_β scores.
Exactly. And that is why we shouldn't be talking about rms as if it was a supervised loss as defined above (and in LossFunctions.jl). Something that doesn't fit the definition above deserves a separate API and set of traits.
We consequently take a more general point of view than you propose: A measure is a function applied to a sample, and we do not require that it be the aggregate of any function applied to individual observations.
This general view is useless in practice, because I need to know the nature of the function that I am applying to a sample. If I know that the function for example satisfies the definition I gave above, I can expect properties to hold. Now we have a generic thing called "measure" that puts together a bunch of different concepts on the same bag. The user is now terrified because he doesn't know what is the combination of traits he should use to filter things out.
Sorry, I guess I'm missing some use cases here. For me any loss function becomes a scoring function if I multiply by minus one, and vice-versa. I suppose it' common to suppose a loss returns a value between 0 and 1, with 1 optimal, but I was not aware this was a universal convention or essentially used anywhere. Can you provide me with an example of an algorithm that consumes loss functions that cannot also consume scores by simply multiplying the evaluations by minus one (after testing
orientation
trait`)?
For example, as I defined above, all losses for me are "weightable" because this is a property of the expectation operator and not of the loss. As you mentioned there are scoring rules which are not computed on a per-sample basis and not aggregated with an expectation operator. So I cannot use those.
Yes, but your definition, as noted earlier, is too restrictive for our purposes.
Again, I am not proposing a redefinition of measure, I am proposing a specific definition of loss. As I understand you have loss + scoring rules + whatever = performance measure, but I don't care about the rest of the list at this moment. Just the loss functions.
Unfortunately we have views of the world that are too different when it comes to software design. I am always willing to contribute to the MLJ stack, but I realize that it is very difficult to do so given that my research needs are not being addressed by the current design. I could try to adapt my viewpoint to contribute, but that is not efficient because the proposal you have where yhat
and y
have different type does not seem right conceptually, and only makes things more complex than strictly necessary. In that scenario, where I already tried to clarify my concerns with a GitHub issue as usual, I think the most productive path forward is to just fork the concepts that I am not satisfied with as I've been doing in GeoStats.jl.
If for some reason we change our minds in the future about this design, we can try to reconcile the codebases.
I've actually just discovered that LossFunctions.jl does the weighting correctly: https://juliaml.github.io/LossFunctions.jl/stable/user/aggregate/ Sharing in case someone stumbles on the same bug here.
@juliohm could you summarize the main issues you have with this interface? None of the issues here seem irreconcilable, and I really don't want to fragment the Julia ML ecosystem the way other interfaces and ecosystems (like named arrays or automatic differentiation) have been. There maybe be some places where we have to create different packages, but as much as possible I think we should try to make sure everything is interoperable.
To try and give a summary of the main issues I've found:
First, it looks like you want to focus on the narrower category of proper loss functions, rather than generic loss functionals. How about we create a new type called something like "Separable loss functions" that contains only losses that can be expressed as f(mean(loss(yhat, y)))
, where f
is monotonic and equal to the identity by default? (f is there because sometimes, adding one final function call can make the resulting loss function easier to interpret, as in RMS; however, this doesn't make a difference as long as f is monotonic.)
This way we can allow generalized loss functionals without. Or, if you'd like, we could split this package into two packages, one for separable+proper loss functions and one for more "unusual" losses.
Unless we have a use case for a different aggregation method that is not the sample mean, this trait is also unnecessary.
I believe this is just a convenience for computational efficiency. It's always possible to find a function f
such that invf(sum(f, x)) == accumulate(aggregator, x)
. For example, the logarithm to convert from products to sums. I think this could be deprecated in theory, or just pushed into some hidden corner of the documentation with a default of mean
(to avoid bothering new users implementing this interface).
I could try to adapt my viewpoint to contribute, but that is not efficient because the proposal you have where yhat and y have different type does not seem right conceptually, and only makes things more complex than strictly necessary.
Can you clarify what you'd propose as an alternative interface here?
I really don't want to fragment the Julia ML ecosystem the way other interfaces and ecosystems
I sympathize with this feeling, but please understand that I had done my homework before moving forward with the development of alternative packages. Thank you for trying to revive this issue though.
How about we create a new type called something like "Separable loss functions" that contains only losses that can be expressed as
f(mean(loss(yhat, y)))
, wheref
is monotonic and equal to the identity by default?
That is JuliaML/LossFunctions.jl (I am the main maintainer nowadays).
Can you clarify what you'd propose as an alternative interface here?
I disagree with many design decisions that have been made in the project and respect them. I don't have any intention to brainstorm MLJ interfaces at this point in time. As I mentioned in another issue, we are not using the project in our industrial applications anymore.
In case it is useful, MLJBase measures were recently moved out to StatisticalMeasures.jl. These are based on a modified system of traits that are part of StatisticalMeasuresBase.jl.
In case it is useful, MLJBase measures were recently moved out to StatisticalMeasures.jl. These are based on a modified system of traits that are part of StatisticalMeasuresBase.jl.
Oh, this is great, it looks like the two interfaces are compatible now, so I can just use StatisticalMeasures.jl with LossFunctions.jl measures. Thank you for the hard work on this, Anthony!
In the tradition of Julia, this issue follows the "Taking X seriously" convention where "X" here represents loss functions in statistical learning.
The current state of affairs of loss functions (or more generally "measures" in MLJ) is not ideal. There is a lot of code repetition that could be avoided, and a lot of machinery that could be reused in various different measures. In particular, the weighting machinery varies for different measures, and as discussed in #445 it does not serve for cost-sensitive learning, or more generally, transfer learning. Additionally, measure implementations are not necessarily ready for automatic differentiation nor they are ready for computation on GPUs.
I would like to redesign the measures in MLJ to include all important use cases, and to facilitate future additions. For that, I need your help. Before we dive into specific questions about the current traits implemented for measures, I would like to share what I think should be the high-level abstraction for measures. The definitions below are heavily inspired by the LossFunctions.jl documentation, and by a more theoretical view on empirical risk minimization.
Let's concentrate our attention to supervised loss functions, i.e. functions
L(yhat, y)
that operate on scalar objectsythat
andy
. By scalar object I only mean an object with 0 dimensions (e.g. numbers in the real line). For now I will assume that these scalar objects are<:Real
, but if you feel that for exampleythat
should include other objects like distributions, please motivate your claim that loss functions should be the mechanism to compare numbersy
with distributionsythat
. It is not necessarily clear that a loss function should support this comparison.For a supervised loss function
L
, we should be able to perform at least two operations:(yhat, y)
E[L]
using a sample ofn
pairs:E[L] ~ (1/n) * sum(L(yhat_i, y_i))
In the second operation, we can also introduce a weighting function:
E[W*L] ~ (1/n) * sum(w_i * L(yhat_i, y_i))
where each pair has a different weight in the final estimate. This mechanism is quite important in transfer learning, where the weights are given by the ratio of the test and train distributions
w(x) = p_test(x) / p_train(x)
. We've formalised the process of estimating these weights in DensityRatioEstimation.jl, and we need to make sure that the loss functions API consumes them correctly.To start this discussion, I would like to go over the existing traits for measures. First, I would like to understand how each trait is currently used in other parts of MLJ.jl. Below is the full list of traits I could find:
I understand that
is_measure_type
checks if a type is a measure type. In my opinion, the more useful trait operates on instancesis_measure
. How is the trait on the type being used? Can't we just rename it tois_measure
and cover both cases (type + instance)?I understand that
name
stores the name of the measure, and that this trait is a global trait in MLJ. I like it that we can always recover the name of objects in the stack.Am I correct to say that the existence of
target_scitype
andprediction_type
is due to the fact that loss functions currently compare objects of different type? Should it be that way? Is it covering that comparison betweenyhat=distribution
andy=number
? My opinion at this moment favours a simple interfaceL(yhat,y)
whereyhat
andy
are scalars of the same scientific type. I understand thatyhat=f(x)
is the output of a learning model withtarget_scitype
andprediction_type
, but propagating this type information seems unnecessarily complex.From the definition I shared above, every loss function should support weights. The weights are not a property of the loss function itself, but a property of the expectation operator. I would just deprecate
support_weghts
and implement the weighting mechanism outside the losses.I personally find the
orientation
trait suboptimal. I understand the desire to include multiple concepts (loss, score, etc) on the same umbrella, but we lose expressivity doing so. There will be traits in the future that only make sense whenorientation=:loss
ororientation=:score
. You already know that my vote goes for deprecating this trait, and working on separate concepts for losses, scores, etc. It doesn't mean that we need to have different trait names for these concepts, it just means that we won't be thinking about them as a single generic concept called measures. I would like to be able for example to replaceis_measure
by more specific traits in my user code likeis_loss
oris_score
. Code that consumes losses does not necessarily consumes scores, and vice-versa. So in summary, my suggestion would be to deprecateorientation
, introduceis_loss
,is_score
,etc and finally define a newis_measure(x) = is_loss(x) || is_score(x)
for the generic check.I understand that the trait
reports_each_observation
tells whether or not a loss is returned for the whole sample or per pair in the sample. This doesn't make much sense to me in the context of loss functions based on the definitions above about expected losses in samples. Can you please elaborate on how this trait is being used elsewhere in MLJ? I see that L1 and L2 losses for example report the values for each observation, but wouldn't it be simpler to just broadcast the equivalent scalar losses? To me thisreports_each_observation
trait could be deprecated as well.I understand that the trait
aggregation
tells which aggregation method is used to combine the losses for each pair in the sample. Unless we have a use case for a different aggregation method that is not the sample mean, this trait is also unnecessary. Can you please elaborate on how it is being used elsewhere in the stack?I don't understand the
is_feature_dependent
trait. Could you please elaborate? My guess is that it tells whether or not the lossL(yhat,y)
is dependent on the featurex
used to estimateyhat=f(x)
, but I am probably wrong because this is always the case? Also, I noticed that all current losses have this set tofalse
.I like the general
docstring
trait available for all objects in the MLJ stack.The
distribution_type
trait seems to be another trait that is the result of allowing losses between objectsyhat
andy
of different kind. Could you please elaborate on what is the meaning of this trait and how it relates totarget_scitype
andprediction_type
?I appreciate your time replying to all these questions, and apologise in advance if my words appear harsh. I am not a native English speaker, so I write with a reduced vocabulary that sometimes may sound aggressive to some.
If you can take a careful look at all these points, that would be extremely helpful. My current research depends on this, and the sooner I get your feedback, the faster I will be able to contribute.