KitWallace / AIDVIEW-DB

A repository, browser and API for IATI activities
1 stars 0 forks source link

missing data in facet selection and analysis #123

Open KitWallace opened 11 years ago

KitWallace commented 11 years ago

The facet summaries and selection by facet should include a missing occurrence. Knowledge of the missing activities in an facet is necessary to evaluate the quality of the facet data, and selection of activities with missing data is useful for exploring data quality.

KitWallace commented 11 years ago

I have a script to compute the missing data for a sector - its slow - for example for Country

Country missing-count="53866" missing-value="1.6079336537832996E11" known-value="5.707493050123569E11" corpus-count="140144" missing-count-pc="38" missing-value-pc="22"/>

Ie, 38% of activities dont have a recipient-country (although they may have a region)

Region missing-count="98203" missing-value="4.2624549106331714E11" known-value="3.05296800794E11" corpus-count="140144" missing-count-pc="70" missing-value-pc="58"/>

Sector missing-count="35574" missing-value="4.905657918684E10" known-value="6.01604095653E11" corpus-count="140144" missing-count-pc="25" missing-value-pc="8"/>

That is surprisingly high considering that there are multiple vocabs and this is only DAC

SectorCategory missing-count="44241" missing-value="1.5882517577684E11" known-value="5.61485306924E11" corpus-count="140144" missing-count-pc="32" missing-value-pc="22"/>

Funder missing-count="10347" missing-value="3.44081000243171E10" known-value="7.033782948044E11" corpus-count="140144" missing-count-pc="7" missing-value-pc="5"/>

Reporter missing-count="0" missing-value="0" known-value="7.31551869384717E11" corpus-count="140144" missing-count-pc="0" missing-value-pc="0"/>

Status missing-count="140144" missing-value="7.315518693847172E11" known-value="0" corpus-count="140144" missing-count-pc="100" missing-value-pc="100"/>

Cant simply add this as another facet occurrence because it would then be added to the dropdown and special code would be needed in the query API to compute the filter