bayes_snapper.npy outdated and undocumented

geissdoerfer commented 6 months ago

The repository contains a bayes_snapper.npy that apparently contains a bayesian model to rate satellite quality. Unfortunately the model seems to have been generated with an outdated scikit version that is not compatible anymore with recent Python. Would you be able to provide the data and method to generate the model? This would benefit reproducibility. Thanks!

JonasBchrt commented 6 months ago

I will have a look - give me a few days.

Some preliminary notes:

I have a dataset with SnapperGPS data from different scenarios with all satellites labelled whether they are visible or not (based on an elevation threshold) and whether their pseudorange was useful to estimate a position or is an outlier. In addition, it contains the estimated SNRs of all satellites.
Then - for each GNSS separately - I fit a Bayes classifier that maps SNRs to a binary label good satellite/bad satellite.
I use GaussianNB from sklearn.naive_bayes for fitting.
This assumes that SNRs are Gaussian distributed given the label, which is obviously an approximation.

This is from my thesis:

At first, it derives a prior probability P (vi = 1|SNRi) for each satellite observation i ∈ 1, . . . N to be reliable, i.e., to be a so-called inlier, given the associated SNR. The distribution p (SNRi) of the SNRs is modelled as a Gaussian mixture model with two components, p (SNRi|vi = 1) for the inliers and p (SNRi|vi = 0) for the outliers. Mean, standard deviation, and prior of each component are fitted to a training dataset. This is done separately for each GNSS since the GPS L1 signal, the Galileo E1 signal, and the BeiDou B1C signal have different properties and, therefore, differently distributed SNRs. Using the resulting probabilistic models and Bayes’ rule, the priors P (vi = 1|SNRi) = p(SNRi|vi=1)P(vi=1)/p(SNRi) for each satellite to be an inlier and P (vi = 0|SNRi) = 1 − P (vi = 1|SNRi) to be an outlier are obtained.

Footnote:

Technically, SNRs are strictly positive while a Gaussian distribution’s support includes all non-positive numbers, too. However, a Gaussian distribution is chosen because the probability contained in the distribution’s tail that extends into the negative numbers is negligibly small for the considered problem in practice. In addition, efficient algorithms for interference exist for Gaussian distributions.

geissdoerfer commented 6 months ago

Thanks for the explanation, it makes sense. The table of labeled training data (csv?) and a script to train the model in the repository would be very helpful!

JonasBchrt / snapshot-gnss-algorithms

bayes_snapper.npy outdated and undocumented #7