Open ESapenaVentura opened 10 months ago
Types of issues:
Due to a PUT request, the submissionDate
field has been deleted
2184e63d-82d8-4ab2-839e-e93f8395f568
: project entity08fb10df-32e5-456c-9882-e33fcd49077a
: Supplementary file (Spreadsheet) DUE TO projectc16a754f-5da3-46ed-8c1e-6426af2ef625
: this is an old dataset whose staging area was deleted.The script to reconstruct the old datasets is reconstructing the processes with old versions (9.0.0 instead of 9.2.0)
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
c4077b3c-5c98-4d26-a614-246d12c2e5d7
2184e63d-82d8-4ab2-839e-e93f8395f568
Connect to the MongoDB and update the project
submissionDate
to match updateDate
:
import uuid
import pymongodb
db = pymongo.mongo_client.MongoClient('mongodb://localhost:27017/', uuidRepresentation='javaLegacy')
admin = db['admin']
projects = admin['project']
project = projects.find_one({'uuid': {'uuid': uuid.UUID('08fb10df-32e5-456c-9882-e33fcd49077a')}})
ok = projects.update_one({'uuid': {'uuid': uuid.UUID('2184e63d-82d8-4ab2-839e-e93f8395f568')}}, update={'$set': {'submissionDate': project['updateDate']}})
Modify staging area to avoid re-export
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/2184e63d-82d8-4ab2-839e-e93f8395f568/metadata/project/2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json | jq -jc '.provenance.submission_date = "2023-11-06T15:39:46.916Z"' > 2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json
gsutil cp 2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/2184e63d-82d8-4ab2-839e-e93f8395f568/metadata/project/2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/2184e63d-82d8-4ab2-839e-e93f8395f568/metadata/supplementary_file/6d27a39f-d9a9-5bef-9f06-3e46a7fb0032_2023-11-06T15:24:55.140000Z.json | jq -jc '.provenance.submission_date = "2023-11-06T15:39:46.916Z"' > 6d27a39f-d9a9-5bef-9f06-3e46a7fb0032_2023-11-06T15:24:55.140000Z.json
gsutil cp 6d27a39f-d9a9-5bef-9f06-3e46a7fb0032_2023-11-06T15:24:55.140000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/2184e63d-82d8-4ab2-839e-e93f8395f568/metadata/supplementary_file/6d27a39f-d9a9-5bef-9f06-3e46a7fb0032_2023-11-06T15:24:55.140000Z.json
08fb10df-32e5-456c-9882-e33fcd49077a
Connect to the MongoDB and update the project
submissionDate
to match updateDate
:
import uuid
import pymongodb
db = pymongo.mongo_client.MongoClient('mongodb://localhost:27017/', uuidRepresentation='javaLegacy')
admin = db['admin']
projects = admin['project']
project = projects.find_one({'uuid': {'uuid': uuid.UUID('08fb10df-32e5-456c-9882-e33fcd49077a')}})
ok = projects.update_one({'uuid': {'uuid': uuid.UUID('08fb10df-32e5-456c-9882-e33fcd49077a')}}, update={'$set': {'submissionDate': project['updateDate']}})
Modify staging area to avoid re-export
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/08fb10df-32e5-456c-9882-e33fcd49077a/metadata/project/08fb10df-32e5-456c-9882-e33fcd49077a_2023-11-23T14:56:59.852000Z.json | jq -jc '.provenance.submission_date = "2023-11-23T14:56:59.852Z"' > 2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json
gsutil cp 2184e63d-82d8-4ab2-839e-e93f8395f568_2023-10-31T11:26:09.829000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/08fb10df-32e5-456c-9882-e33fcd49077a/metadata/project/08fb10df-32e5-456c-9882-e33fcd49077a_2023-11-23T14:56:59.852000Z.json
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/08fb10df-32e5-456c-9882-e33fcd49077a/metadata/supplementary_file/b41ebd9b-c1ab-50bf-85b4-ba53fd10268b_2023-11-24T09:45:31.006000Z.json | jq -jc '.provenance.submission_date = "2023-11-24T09:45:31.006Z.json"' > b41ebd9b-c1ab-50bf-85b4-ba53fd10268b_2023-11-24T09:45:31.006000Z.json
gsutil cp b41ebd9b-c1ab-50bf-85b4-ba53fd10268b_2023-11-24T09:45:31.006000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/08fb10df-32e5-456c-9882-e33fcd49077a/metadata/supplementary_file/b41ebd9b-c1ab-50bf-85b4-ba53fd10268b_2023-11-24T09:45:31.006000Z.json
c16a754f-5da3-46ed-8c1e-6426af2ef625
# Descriptors
gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c16a754f-5da3-46ed-8c1e-6426af2ef625/descriptors/analysis_file/
gsutil cp gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c16a754f-5da3-46ed-8c1e-6426af2ef625/links/ded0820d-4c23-56f6-81ec-53d376cde4a9_2023-11-06T11:55:59.994000Z_c16a754f-5da3-46ed-8c1e-6426af2ef625.json ded0820d-4c23-56f6-81ec-53d376cde4a9_2023-11-06T11:55:59.994000Z_c16a754f-5da3-46ed-8c1e-6426af2ef625.json gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c16a754f-5da3-46ed-8c1e-6426af2ef625/links/ gsutil cp ded0820d-4c23-56f6-81ec-53d376cde4a9_2023-11-06T11:55:59.994000Z_c16a754f-5da3-46ed-8c1e-6426af2ef625.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c16a754f-5da3-46ed-8c1e-6426af2ef625/links/ded0820d-4c23-56f6-81ec-53d376cde4a9_2023-11-06T11:55:59.994000Z_c16a754f-5da3-46ed-8c1e-6426af2ef625.json
gsutil ls gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c16a754f-5da3-46ed-8c1e-6426af2ef625/metadata/ | grep -v "supplementary_file" | grep -v "project" | xargs -I{} sh -c "gsutil -m rm -r {}"
e526d91d-cf3a-44cb-80c5-fd7676b55a1d/c4077b3c-5c98-4d26-a614-246d12c2e5d7
c4077b3c-5c98-4d26-a614-246d12c2e5d7
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/e526d91d-cf3a-44cb-80c5-fd7676b55a1d/
gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c4077b3c-5c98-4d26-a614-246d12c2e5d7/
c4077b3c-5c98-4d26-a614-246d12c2e5d7
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
Monitoring export:
c4077b3c-5c98-4d26-a614-246d12c2e5d7
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
fixed issue with 08fb10df-32e5-456c-9882-e33fcd49077a
--> date format was wrong
working on e526d91d-cf3a-44cb-80c5-fd7676b55a1d
failed second round of validation due to a mismatch in crc32c - issue already seen in R17 but no lead on what caused it
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
Failing validation because of a mismatch in crc32c
the crc32c for SRR11798395_R2.fastq.gz
is:
ad2d4735
kTqVhA==
913a9584
I don't know how the crc32c can be different for all three, which makes me think that even if we re-uploaded and exported the project the error could come back
we opted for a workaroud since we didn't have the capacity to investigate further --> we skipped the soft deletion and update of the linking, and only exported the project json
@amnonkhen if you have some time can you please look into this error? If you think re-uploading the project could solve it I can try it once I'm back from AL
moving to the icebox - when we have capacity we can prioritise
From Samn
EBI -
08fb10df-32e5-456c-9882-e33fcd49077a
-EBI -
2184e63d-82d8-4ab2-839e-e93f8395f568
EBI -
c16a754f-5da3-46ed-8c1e-6426af2ef625
EBI -
e526d91d-cf3a-44cb-80c5-fd7676b55a1d
EBI -
c4077b3c-5c98-4d26-a614-246d12c2e5d7
Acceptance criteria
2184e63d-82d8-4ab2-839e-e93f8395f568
08fb10df-32e5-456c-9882-e33fcd49077a
c16a754f-5da3-46ed-8c1e-6426af2ef625
c4077b3c-5c98-4d26-a614-246d12c2e5d7
e526d91d-cf3a-44cb-80c5-fd7676b55a1d