Open KitWallace opened 11 years ago
I have a script to compute the missing data for a sector - its slow - for example for Country
Country missing-count="53866" missing-value="1.6079336537832996E11" known-value="5.707493050123569E11" corpus-count="140144" missing-count-pc="38" missing-value-pc="22"/>
Ie, 38% of activities dont have a recipient-country (although they may have a region)
Region missing-count="98203" missing-value="4.2624549106331714E11" known-value="3.05296800794E11" corpus-count="140144" missing-count-pc="70" missing-value-pc="58"/>
Sector missing-count="35574" missing-value="4.905657918684E10" known-value="6.01604095653E11" corpus-count="140144" missing-count-pc="25" missing-value-pc="8"/>
That is surprisingly high considering that there are multiple vocabs and this is only DAC
SectorCategory missing-count="44241" missing-value="1.5882517577684E11" known-value="5.61485306924E11" corpus-count="140144" missing-count-pc="32" missing-value-pc="22"/>
Funder missing-count="10347" missing-value="3.44081000243171E10" known-value="7.033782948044E11" corpus-count="140144" missing-count-pc="7" missing-value-pc="5"/>
Reporter missing-count="0" missing-value="0" known-value="7.31551869384717E11" corpus-count="140144" missing-count-pc="0" missing-value-pc="0"/>
Status missing-count="140144" missing-value="7.315518693847172E11" known-value="0" corpus-count="140144" missing-count-pc="100" missing-value-pc="100"/>
Cant simply add this as another facet occurrence because it would then be added to the dropdown and special code would be needed in the query API to compute the filter
The facet summaries and selection by facet should include a missing occurrence. Knowledge of the missing activities in an facet is necessary to evaluate the quality of the facet data, and selection of activities with missing data is useful for exploring data quality.