HumanCellAtlas / secondary-analysis

Secondary Analysis Service of the Human Cell Atlas Data Coordination Platform
https://pipelines.data.humancellatlas.org/ui/
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Fix "unknown" analysis file types #783

Closed kbergin closed 4 years ago

kbergin commented 5 years ago

When submitting analysis file metadata, we refer to a mapping of file extensions to file types and any file extension not in the map is marked as "unknown". Update the mapping to include any missing file extensions.

https://github.com/HumanCellAtlas/pipeline-tools/blob/master/adapter_pipelines/format_map_example.json

┆Issue is synchronized with this Jira Story ┆Attachments: file_format_map.json

kbergin commented 5 years ago

➤ Saman Ehsan commented:

I attached what the updated file_format_map.json would look like to fix these issues. Specifically, I added “npz”, “npy” and “csv.gz” file extensions and generalized the pattern for matching zarr files. This should resolve all of the files that have “unknown” file types at the moment.

kbergin commented 5 years ago

➤ Saman Ehsan commented:

For reference, here are the logs from a SS2 workflow where you can see most of the zarr files are not matching any pattern in the file_format_map.json: https://00e9e64bac8618b42c41bcb6af9a180ebe18189dbbce03ae39-apidata.googleusercontent.com/download/storage/v1/b/hca-dcp-pipelines-prod-cromwell-execution/o/caas-cromwell-executions%2FAdapterSmartSeq2SingleCell%2Fd8642609-29f2-47fc-b1a5-125776ac10ee%2Fcall-submit%2Fsubmit_wdl.submit%2Fbeeaf3f8-072e-4402-8d4d-26cd72ac7a59%2Fcall-create_submission%2Fstdout?qk=AD5uMEtO2WkdowJ32XWP5T7-mqK4a-9V1sHCBYOHUbz7oAB4gwQXYctzGBGGcug7zHIjcDdK6TxanAMpnN33Y_z-NiQ_wLmjjcKqJ_wVgSJG9loomipbshVg9PK152J3ccJwjMST_wjMRXmQtTTe_n6hrQpBgn0dV_ODbF-PmocMX8Zt_gNocrzAunqohg4xhE2axHFfuScf1qRDUL1qICtfxbiutdkwv_dI8oeDQk6S10Kb6NWbDo3txg_eHRTgceeY_6pRJh1NLrYjtDoEL_aDT3bZ7gwYV5g-dVJa_RoYqFbJH7RLRoAGRCp-na6PYax7AmSh0pxJKCzfQ0MRNYmx0TwH7mkNpvksqVaw-UrB1KbTJuvEmLvPIKwfrRmGlkZMyWg2OXdyuxvD7ZKEoQFgj8DsR1HN6pPMNpX9L2jrvFCZu8uECGq-1bdDnrw758pSvk08i0v6OwDMHXna1EZE5vGmBwSwcvLXgoyLnk0xuGsDboBKmodVGyzdV1bwqJkK8OBV10CS3cYMeu_BnvMM5nhYqH0lIX6uyqcp5ChsMWLqzHH9IUEuZhSglMk04Udv-w8OVILm-6-deqXdETpgXqiehAZdF00M9SD2cjjfvLqvT7UcKYhZeRkYuQcSrUHEU8wAyAAtAKBDOtwXlla1wr5zUoSF-ISWukZZDfZHSpbCBy1Sb_ItcJihJaz2qq1GTmko9fo7FbInQ8d7PW7ogykV9NQRmpMd9PvKBbvZBNNHK4e57mNwwPnq2xesVIgX3gu2k2tvTov2fUvrZFj_QnI1qTH-sJDYAPKXywiIU1OOwMzPUAS4iZd4TdKVDFgz0ubZ_qYbPOcxdA6fENUj3dViL5WBrMUPV3stpCz_c38fbonDMj8CmaA2ki8zG7qlF9Aq08p80lPm0lFdoWl2y5qLTtVY9PEsirw2vjnQXE9m6SLUefEUSR2kEiNyqUrRZ_9IEjhmEjUapBMOp46Y_7xo11XGLlqFskTiJglPQNmbJGt3m2EeI77IiaErM_ckLEIX3ER02KVE2TDqZJEpXJttoBIT0SA4Xn3XJ6KCzlVIvgq1CLheJFFxazfGv8HP1Spsn9dS ( https://00e9e64bac8618b42c41bcb6af9a180ebe18189dbbce03ae39-apidata.googleusercontent.com/download/storage/v1/b/hca-dcp-pipelines-prod-cromwell-execution/o/caas-cromwell-executions%2FAdapterSmartSeq2SingleCell%2Fd8642609-29f2-47fc-b1a5-125776ac10ee%2Fcall-submit%2Fsubmit_wdl.submit%2Fbeeaf3f8-072e-4402-8d4d-26cd72ac7a59%2Fcall-create_submission%2Fstdout?qk=AD5uMEtO2WkdowJ32XWP5T7-mqK4a-9V1sHCBYOHUbz7oAB4gwQXYctzGBGGcug7zHIjcDdK6TxanAMpnN33Y_z-NiQ_wLmjjcKqJ_wVgSJG9loomipbshVg9PK152J3ccJwjMST_wjMRXmQtTTe_n6hrQpBgn0dV_ODbF-PmocMX8Zt_gNocrzAunqohg4xhE2axHFfuScf1qRDUL1qICtfxbiutdkwv_dI8oeDQk6S10Kb6NWbDo3txg_eHRTgceeY_6pRJh1NLrYjtDoEL_aDT3bZ7gwYV5g-dVJa_RoYqFbJH7RLRoAGRCp-na6PYax7AmSh0pxJKCzfQ0MRNYmx0TwH7mkNpvksqVaw-UrB1KbTJuvEmLvPIKwfrRmGlkZMyWg2OXdyuxvD7ZKEoQFgj8DsR1HN6pPMNpX9L2jrvFCZu8uECGq-1bdDnrw758pSvk08i0v6OwDMHXna1EZE5vGmBwSwcvLXgoyLnk0xuGsDboBKmodVGyzdV1bwqJkK8OBV10CS3cYMeu_BnvMM5nhYqH0lIX6uyqcp5ChsMWLqzHH9IUEuZhSglMk04Udv-w8OVILm-6-deqXdETpgXqiehAZdF00M9SD2cjjfvLqvT7UcKYhZeRkYuQcSrUHEU8wAyAAtAKBDOtwXlla1wr5zUoSF-ISWukZZDfZHSpbCBy1Sb_ItcJihJaz2qq1GTmko9fo7FbInQ8d7PW7ogykV9NQRmpMd9PvKBbvZBNNHK4e57mNwwPnq2xesVIgX3gu2k2tvTov2fUvrZFj_QnI1qTH-sJDYAPKXywiIU1OOwMzPUAS4iZd4TdKVDFgz0ubZ_qYbPOcxdA6fENUj3dViL5WBrMUPV3stpCz_c38fbonDMj8CmaA2ki8zG7qlF9Aq08p80lPm0lFdoWl2y5qLTtVY9PEsirw2vjnQXE9m6SLUefEUSR2kEiNyqUrRZ_9IEjhmEjUapBMOp46Y_7xo11XGLlqFskTiJglPQNmbJGt3m2EeI77IiaErM_ckLEIX3ER02KVE2TDqZJEpXJttoBIT0SA4Xn3XJ6KCzlVIvgq1CLheJFFxazfGv8HP1Spsn9dS )

kbergin commented 5 years ago

➤ Saman Ehsan commented:

And here is an example from an Optimus workflow: https://00e9e64bac89f559c5c708d6fb73d93840e96c75ef65725485-apidata.googleusercontent.com/download/storage/v1/b/hca-dcp-pipelines-prod-cromwell-execution/o/caas-cromwell-executions%2FAdapterOptimus%2F16e08724-c927-4124-96da-89101d16efa7%2Fcall-submit%2Fsubmit_wdl.submit%2F44aaa242-f157-4c09-888b-d3c386dd6ebf%2Fcall-create_submission%2Fstdout?qk=AD5uMEvpJxYkoV4Lcr8KShEg-_RlxUtvZ8bn6TBBcJXdlhlQx7JCmMiIm8mpsw3qoVUL7nyLhs8kA86Oa16WygIRliEeCp5DVs26_7BOghRsZCNxsNi1FLr07mBdFbFXNr0vL5zaLM-QFbXxeK5n231cDcNhZfeiW4XaqDzkr7nIA6K7db69QT98cDTOC48tG1izwuTNbA3jWQz8O5s6o5mEBvfrUgrdv68L9m5LGgwTT12FOaB4BWwa2W-LE8i6gAasl_QvgUiuNVNiy0ISOEhrujq5p4cBKTo8lA2KuVTodtiZzvrXejPD2Pn8HTOJA-KrDZpjsOIhfTTszmd5wXoCiGfLi3Bs3yGNdGRvDS7Ilj5b8ioZ7950mzAalmFm_qtKEjMgx7awBOHOgOWbcvbUJaFSv_Dkq_NWEz4kWDpRGaYWB80QVZOaL7UQR4Z_yKNa9-rL-GU7l1w1z9O1My2MfDEZzbSw-trOkEOW9ziu2gegXs5oLot7hA3UAENe2kuNnyxJsZQ7Zys40HRLK-UFqHhTTa7YvlaEO97hv1uNjCIH66PbPS_X9-3sc7gwNOWlFiQ0nVsm54J5Hm9I5oarJ-8D-qEQ4Y2nO9YCEOv838H9LEjbpqjj14faF4Mf8NDbVgtccH23a91NoY3Prkxp6xcRAppOK2HQ16_DD51zblYLwSfFmtIrEReHQE0L99gKC_lnQYeuiJhGdgkuVsvFLmgNNEhYzCJEOdb-2H8TfoFPTkilniJuCsj_9zi-1HYVHxZG_WDW0MSCv6zMOEdiF_BOYkhBkW0POtoeGAXB5htXF8EECzGUGEWX2TQX_6-Xw3Lcb4sEepNAA-Yel_6Mc8w69UMB2JO78sTIA_gbWmx8ZRdlBh-L8NEL8hNFLw3QWLFP0MRbdrYdKKbMLI26Ni4d4_8KA23l_8dHkXeye0G0YJgY7pkqhSHrA9f9lA6xy5P9griTC6Caf0Z8XXprxgDHoTOo2BekJPNWzi8Hw8HmWj0k4AKBDptOqUsuYk275n_ERn0y0IIsJVzhQtei0iQfL1RtTi2baprho9ycnlN_MEXZPjw ( https://00e9e64bac89f559c5c708d6fb73d93840e96c75ef65725485-apidata.googleusercontent.com/download/storage/v1/b/hca-dcp-pipelines-prod-cromwell-execution/o/caas-cromwell-executions%2FAdapterOptimus%2F16e08724-c927-4124-96da-89101d16efa7%2Fcall-submit%2Fsubmit_wdl.submit%2F44aaa242-f157-4c09-888b-d3c386dd6ebf%2Fcall-create_submission%2Fstdout?qk=AD5uMEvpJxYkoV4Lcr8KShEg-_RlxUtvZ8bn6TBBcJXdlhlQx7JCmMiIm8mpsw3qoVUL7nyLhs8kA86Oa16WygIRliEeCp5DVs26_7BOghRsZCNxsNi1FLr07mBdFbFXNr0vL5zaLM-QFbXxeK5n231cDcNhZfeiW4XaqDzkr7nIA6K7db69QT98cDTOC48tG1izwuTNbA3jWQz8O5s6o5mEBvfrUgrdv68L9m5LGgwTT12FOaB4BWwa2W-LE8i6gAasl_QvgUiuNVNiy0ISOEhrujq5p4cBKTo8lA2KuVTodtiZzvrXejPD2Pn8HTOJA-KrDZpjsOIhfTTszmd5wXoCiGfLi3Bs3yGNdGRvDS7Ilj5b8ioZ7950mzAalmFm_qtKEjMgx7awBOHOgOWbcvbUJaFSv_Dkq_NWEz4kWDpRGaYWB80QVZOaL7UQR4Z_yKNa9-rL-GU7l1w1z9O1My2MfDEZzbSw-trOkEOW9ziu2gegXs5oLot7hA3UAENe2kuNnyxJsZQ7Zys40HRLK-UFqHhTTa7YvlaEO97hv1uNjCIH66PbPS_X9-3sc7gwNOWlFiQ0nVsm54J5Hm9I5oarJ-8D-qEQ4Y2nO9YCEOv838H9LEjbpqjj14faF4Mf8NDbVgtccH23a91NoY3Prkxp6xcRAppOK2HQ16_DD51zblYLwSfFmtIrEReHQE0L99gKC_lnQYeuiJhGdgkuVsvFLmgNNEhYzCJEOdb-2H8TfoFPTkilniJuCsj_9zi-1HYVHxZG_WDW0MSCv6zMOEdiF_BOYkhBkW0POtoeGAXB5htXF8EECzGUGEWX2TQX_6-Xw3Lcb4sEepNAA-Yel_6Mc8w69UMB2JO78sTIA_gbWmx8ZRdlBh-L8NEL8hNFLw3QWLFP0MRbdrYdKKbMLI26Ni4d4_8KA23l_8dHkXeye0G0YJgY7pkqhSHrA9f9lA6xy5P9griTC6Caf0Z8XXprxgDHoTOo2BekJPNWzi8Hw8HmWj0k4AKBDptOqUsuYk275n_ERn0y0IIsJVzhQtei0iQfL1RtTi2baprho9ycnlN_MEXZPjw )

kbergin commented 5 years ago

➤ Chengchen Wang commented:

[~accountid:557058:d066972f-475f-47ee-85b5-245cfa6dc14f] Both of the 2 links are broken unfortunately.

kbergin commented 5 years ago

➤ Saman Ehsan commented:

Here is the PR! https://github.com/HumanCellAtlas/pipeline-tools/pull/164 ( https://github.com/HumanCellAtlas/pipeline-tools/pull/164|smart-link )

kbergin commented 4 years ago

➤ Saman Ehsan commented:

This is ready to go, but waiting on a fix from Azul for how they check the number of analysis files in a secondary analysis bundle.

kbergin commented 4 years ago

➤ Saman Ehsan commented:

The fix from Azul is in their dev environment, and they will let us know when it gets promoted up to production.

kbergin commented 4 years ago

➤ Saman Ehsan commented:

The issue has been fixed by Azul!

kbergin commented 4 years ago

➤ Saman Ehsan commented:

For QA, check a SS2 and Optimus submission envelope and ensure that there are no “unknown” file types. Note: The file metadata is paginated, and there are links at the end of the page for navigating forward/back to view the rest of the results.

E.g.

SS2: Cromwell workflow: “cede9808-60db-4fa2-bfc8-6aed38e802d9” File metadata in submission envelope: https://api.ingest.integration.data.humancellatlas.org/submissionEnvelopes/5d9e24c223c41d0008bad6ca/files ( https://api.ingest.integration.data.humancellatlas.org/submissionEnvelopes/5d9e24c223c41d0008bad6ca/files )

Optimus:

Cromwell workflow: “e1dc764e-e365-4259-a65c-0285317ef9f6“

File metadata in submission envelope:

https://api.ingest.integration.data.humancellatlas.org/submissionEnvelopes/5d9dd6b923c41d0008bad3dd/files ( https://api.ingest.integration.data.humancellatlas.org/submissionEnvelopes/5d9dd6b923c41d0008bad3dd/files )

kbergin commented 4 years ago

➤ Chengchen Wang commented:

QAed!

kbergin commented 4 years ago

➤ Saman Ehsan commented:

Thanks!