algolia / cli

🔍 Algolia’s official CLI devtool
https://www.algolia.com/doc/tools/cli
MIT License
101 stars 26 forks source link

feat(commands) Add index analyze command #136

Closed clemfromspace closed 7 months ago

clemfromspace commented 7 months ago

Index Analyze Command

This PR contains the CLI implementation of the index analyzer tool from https://github.com/algolia/tools.

This command displays records statistics - frequency of the attributes and their types - for the specified index. This can be useful to help you identify individual records (or attributes) within an index that do not conform to the rest of the dataset (e.g. numeric attributes that have null values).

$ algolia -p media indices analyze prod_MEDIA

KEY                                 COUNT  %        TYPES                                            USED IN SETTINGS
backdrop_path                       1000   100.00%  string: 100.00%                                  []
bayesian_avg                        1000   100.00%  numeric: 100.00%                                 []
cast                                1000   100.00%  array: 100.00%                                   []
cast_lead                           1000   100.00%  array: 100.00%                                   []
created_by                          1000   100.00%  array: 100.00%                                   []
directors                           1000   100.00%  array: 100.00%                                   []
first_air_date                      1000   100.00%  numeric: 98.90%, null: 1.10%                     []
genres                              1000   100.00%  array: 100.00%                                   [attributesForFaceting searchableAttributes]
in_production                       1000   100.00%  boolean: 100.00%                                 []
last_air_date                       1000   100.00%  numeric: 100.00%                                 [customRanking]
last_episode_to_air                 1000   100.00%  object: 100.00%                                  []
last_episode_to_air.air_date        1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.episode_number  1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.name            1000   100.00%  null: 63.00%, string: 37.00%                     []
last_episode_to_air.overview        1000   100.00%  string: 29.80%, null: 70.20%                     []
last_episode_to_air.season_number   1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.still_path      1000   100.00%  null: 69.20%, string: 30.80%                     []
last_episode_to_air.vote_average    1000   100.00%  numeric: 100.00%                                 []
networks                            1000   100.00%  array: 100.00%                                   []
next_episode_to_air                 1000   100.00%  object: 69.00%, null: 31.00%                     []
next_episode_to_air.air_date        690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.episode_number  690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.name            690    69.00%   string: 19.60%, undefined: 31.00%, null: 49.40%  []
next_episode_to_air.overview        690    69.00%   undefined: 31.00%, null: 56.40%, string: 12.60%  []
next_episode_to_air.season_number   690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.still_path      690    69.00%   string: 9.20%, undefined: 31.00%, null: 59.80%   []
next_episode_to_air.vote_average    690    69.00%   numeric: 69.00%, undefined: 31.00%               []
number_of_episodes                  1000   100.00%  numeric: 100.00%                                 []
number_of_seasons                   1000   100.00%  numeric: 100.00%                                 []
objectID                            1000   100.00%  string: 100.00%                                  []
origin_country                      1000   100.00%  array: 100.00%                                   []
original_language                   1000   100.00%  string: 100.00%                                  []
original_title                      1000   100.00%  string: 100.00%                                  [searchableAttributes]
overview                            1000   100.00%  null: 33.00%, string: 67.00%                     []
popularity                          1000   100.00%  numeric: 100.00%                                 [customRanking]
popularity_bucketed                 1000   100.00%  numeric: 100.00%                                 []
poster_path                         1000   100.00%  string: 100.00%                                  []
record_type                         1000   100.00%  string: 100.00%                                  [attributesForFaceting]
seasons                             1000   100.00%  array: 100.00%                                   []
spoken_languages                    1000   100.00%  array: 100.00%                                   [attributesForFaceting]
status                              1000   100.00%  string: 100.00%                                  [attributesForFaceting]
tagline                             1000   100.00%  null: 87.40%, string: 12.60%                     []
title                               1000   100.00%  string: 100.00%                                  [searchableAttributes]
type                                1000   100.00%  string: 100.00%                                  [attributesForFaceting]
videos                              1000   100.00%  array: 100.00%                                   []
vote_average                        1000   100.00%  numeric: 100.00%                                 []
vote_count                          1000   100.00%  numeric: 100.00%                                 []
$ algolia -p media indices analyze prod_MEDIA --only genres

VALUE               COUNT  %
Drama               284    28.40%
Comedy              197    19.70%
Reality             165    16.50%
Documentary         104    10.40%
Animation           88     8.80%
Family              75     7.50%
Crime               71     7.10%
Talk                66     6.60%
Action & Adventure  62     6.20%
Sci-Fi & Fantasy    51     5.10%
Mystery             44     4.40%
News                32     3.20%
Soap                31     3.10%
Kids                21     2.10%
War & Politics      10     1.00%
Western             1      0.10%
Music               1      0.10%