NHMDenmark / DaSSCo-asset-service

DaSSCo asset service is part of DaSSCo storage system
0 stars 0 forks source link

Assets stuck with ASSET_RECEIVED status #45

Open Baeist opened 4 months ago

Baeist commented 4 months ago

Two assets, ucloud-test-21 and ucloud-test-27, both seem to be stuck after trying to sync them with erda. This is the response from get asset status: { "asset_guid": "ucloud-test-21", "parent_guid": null, "error_timestamp": null, "status": "ASSET_RECEIVED", "error_message": null, "share_allocation_mb": 611 } The issue is that the status hasnt changed to ERDA_ERROR (or actually syncing). There just seems to be no progress with them. It is likely that they can be resynced manually since as far as i can tell there doesnt seem to be any other issues with them. We need a way to tell that this is the case though and not just think they are waiting normally to sync.

@bhsi-snm

Baeist commented 1 month ago

Assets stuck for more than 10 min with the same status and error message, this indicates that the assets are not attempted resynced. I am not sure which timezone the ARS uses, if it is utc+0 or GMT then this would indicate that the assets have been stuck like this for ~1 hour. Date: Wed, 21 Aug 2024 08:32:15 GMT { "asset_guid": "ucloud-test-458", "parent_guid": null, "error_timestamp": "2024-08-21T07:45:41.355Z", "status": "ASSET_RECEIVED", "error_message": null, "share_allocation_mb": 611 } Date: Wed, 21 Aug 2024 08:42:36 GMT {"asset_guid":"ucloud-test-458", "parent_guid":null, "error_timestamp":"2024-08-21T07:45:41.355Z", "status":"ASSET_RECEIVED", "error_message":null, "share_allocation_mb":611}

Baeist commented 1 month ago

Did another test yesterday with 60 assets. I did run into the issue with getting assets stuck for a long time waiting for syncing. Had a decent amount of them(10ish). Now today ive been running other tests, it turns out the stuck assets did eventually get synced(this is still an issue since we at least need to know when this can be expected to happen) and had their status updated so they completed their pipelines when i ran the other tests today. Except for 2 assets that are stuck with ASSET_RECEIVED but have actually been synced since their files are in erda. Asset guid are: "ucloud-test-613" and "ucloud-test-654" .

Baeist commented 1 month ago

Another example of an asset being synced with erda but getting this status, note the fileshare has also been closed: { "asset_guid": "ucloud-test-669", "parent_guid": null, "error_timestamp": null, "status": "ASSET_RECEIVED", "error_message": null, "share_allocation_mb": null }

Baeist commented 2 weeks ago

The only way to change the status of an asset to COMPLETED is to reopen the share and sync again. This is expensive since it moves the file(s) twice more.

Baeist commented 6 days ago

dev-ucloud-77_400 is a recent example of this