HumanCellAtlas / ingest-central

Ingest Central is the hub repository for the ingest service
Apache License 2.0
0 stars 1 forks source link

Remedy links.json in prod #467

Open rolando-ebi opened 5 years ago

rolando-ebi commented 5 years ago

Due to a bug in the exporter, links.json documents in prod may contain duplicate links. In addition, they may erroneously contain links from supplementary files as inputs to analysis bundles. We should write a script to remedy this for all affected production bundles

Acceptance critera:

aaclan-ebi commented 5 years ago

Consider what would be the effect of correcting the links.json to downstream components.

aaclan-ebi commented 5 years ago

feature branch: https://github.com/HumanCellAtlas/ingest-client/tree/task/fix-links script location: https://github.com/HumanCellAtlas/ingest-client/blob/task/fix-links/scripts/fix_bundle_links.py

aaclan-ebi commented 5 years ago

Only analysis bundles are affected.

{
    "005d611a-14d5-4fbf-846e-571a1f874f70": {
        "bundles_to_correct": 6,
        "not_found": 6,
        "bundle_count": 18
    },
    "2043c65a-1cf8-4828-a656-9e247d4e64f1": {
        "bundles_to_correct": 1733,
        "not_found": 0,
        "bundle_count": 3466
    },
    "c4077b3c-5c98-4d26-a614-246d12c2e5d7": {
        "bundles_to_correct": 7,
        "not_found": 7,
        "bundle_count": 21
    },
    "cddab57b-6868-4be4-806f-395ed9dd635a": {
        "bundles_to_correct": 2544,
        "not_found": 0,
        "bundle_count": 5088
    },
    "f83165c5-e2ea-4d15-a5cf-33f3550bffde": {
        "bundles_to_correct": 7605,
        "not_found": 27,
        "bundle_count": 15287
    }
}

List of affected bundles by project are found here: https://app.zenhub.com/files/132741306/8b1dae89-92c2-4664-98d4-bce555335e26/download

script was run against projects from https://tracker.data.humancellatlas.org/

justincc commented 5 years ago

This is blocked pending input from stakeholders as to whether the costs of doing this (co-ordination, actually updating) are worth the benefits, at least right now. @morrisonnorman is gathering this.