Closed bioshot-dotcom closed 1 year ago
Hi!
This is the relevant portion of the docstring (not sure whether you had already seen it): https://github.com/giotto-ai/giotto-tda/blob/7b3e47d7debd48730dc96b49d39dce300625d793/gtda/mapper/cover.py#L35-L39
Let us know if you still have questions.
Thank you, yes actually I had already seen it, and I had also seen this:
In the case of a balanced cover, :meth:`left_limits_` and
:meth:`right_limits_` are computed as follows given a training array `X`:
first, entries in `X` are ranked in ascending order, starting at 1 and
with the same rank repeated in the case of equal values; then, the closed
interval :math:`(0.5, N + 0.5)`, where :math:`N` is the maximum
rank observed, is covered uniformly with parameters `n_intervals` and
`overlap_frac`, yielding intervals :math:`(\\alpha_k, \\beta_k)`;
the final cover is made of intervals :math:`(a_k, b_k)` where, for
:math:`k > 1` (resp. :math:`k < ` `n_intervals`), :math:`a_k` (resp.
:math:`b_k`) is the value of any entry in `X` ranked as the floor (
resp. ceiling) of :math:`\\alpha_k` (resp. :math:`\\beta_k`).
So for example if my entry is X=[1,1,2,3,3,5,7,7,8,9,9,18,27] and the cover kind is uniform with defined n_intervals and overlap_frac my intervals will be x1=[1,1,2], x2=[2,3,3], x3=[3,5,7], x4 = [7,7,8], x5 = [8,9,9] x6= [9,18,27]. Which are the intervals in case of kind='balanced'?
Hi! I apologize for the slow reply. Here is your example and the and intervals computed by the cover:
from gtda.mapper import OneDimensionalCover
n_intervals = 6
overlap_frac = 0.2
cover = OneDimensionalCover(kind='balanced',n_intervals=n_intervals, overlap_frac=overlap_frac)
X = np.array([1, 1, 2, 3, 3, 5, 7, 7, 8, 9, 9, 18, 27])
cover.fit(X)
y = cover.transform(X)
print(f"- Cover:\n{y}")
print(f"- Left limits of each cover interval: {cover.left_limits_}")
print(f"- Right limits of each cover interval: {cover.right_limits_}")
- Cover:
[[ True False False False False False]
[ True False False False False False]
[ True True False False False False]
[False True False False False False]
[False True False False False False]
[False False True False False False]
[False False True True False False]
[False False True True False False]
[False False False True False False]
[False False False False True False]
[False False False False True False]
[False False False False True True]
[False False False False False True]]
- Left limits of each cover interval: [-inf 1. 3. 5. 8. 9.]
- Right limits of each cover interval: [ 3. 5. 8. 9. 27. inf]
The left_limits_
and right_limits_
attributes give you the open interval which produces the cover represented as a boolean array y
. The ith column of y
tells you what elements of X
are in the ith cover set ("interval"), as follows:
for i in range(n_intervals):
print(f"Cover set {i}: {X[y[:, i]]}")
Cover set 0: [1 1 2]
Cover set 1: [2 3 3]
Cover set 2: [5 7 7]
Cover set 3: [7 7 8]
Cover set 4: [ 9 9 18]
Cover set 5: [18 27]
As you can see, it does what it says on the "cover" (i.e. in the docstring): "approximately the same number of unique values from X
is contained in each cover interval." In this case, 2 unique values from X
are mapped to each cover interval.
I can't figure it out what CubicalCover(kind='balanced') does, any suggestions?