AnotherCodeArtist / medien-transparenz.at

Apache License 2.0
3 stars 2 forks source link

Organisation Reference Collection is Incomplete #53

Closed AnotherCodeArtist closed 8 years ago

AnotherCodeArtist commented 8 years ago

The latest import of RTR (June 2016) data has shown that there are several problems with the import routine. Apparently a good part of organizations is missing in the organization collection including for example "Bundesministerium für Inneres". This is important, since for every imported transfer, the corresponding document needs to be looked up and a reference to it is stored. When no such reference document was found, the entire tranfser was ignored. Since this behavior is erroneous and inacceptable the following changes have been made to the master branch:

So there's the following TODO's:

relative-progressio commented 8 years ago

This was really interesting. The list of zip codes did not include a few zip codes (e.g. the one for Bundesministerium für Inneres). To resolve all possible missing references, I edited the list of organisations and the newest report, since the newest report seems to include all previous data. This can be now automated with OpenRefine. I uploaded all the steps so OpenRefine can execute them. Before that, the steps have to be uploaded into the OpenRefine project.

Errors (from MongoDB or during upload) and the entries caused them are now shown at the user interface.

The mentioned quality check is implemented, additionally all information about possible sources of error is shown.

Implemented at: refined-upload-branch

relative-progressio commented 8 years ago

We need to leave the Bundesminierium-entries without changes.