Kitware / UPennContrast

UPenn ?
https://upenn-contrast.netlify.com/
Apache License 2.0
8 stars 6 forks source link

S3 assetstore doesn't work #754

Open arjunrajlab opened 1 month ago

arjunrajlab commented 1 month ago

Appears that the file can be put onto the assetstore, but for some reason, it is not recognized and converted to a large image.

manthey commented 1 month ago

Curiously, when we upload to an S3 bucket it takes a small amount of time before we can read back the data. (Maybe 0.9 seconds?). As soon as the upload is done, we try to make the uploaded file a large_image; this fails because we can't actually read it (we get a response saying the key doesn't exist). The rest of the processing steps don't occur. I manually hack in a wait loop until we can read the data, it does work. I'll try to convert that hack to something less hacky.

arjunrajlab commented 1 month ago

I was looking it up and seems that S3 has an eventual consistency model that may be the culprit? It seems that you get instant read after write for a PUT of an object, but if you modify or erase or something, there can be a delay. Perhaps there is some modification instead of a new object going on here?

manthey commented 1 month ago

I'll have a PR in large_image shortly that checks that the size of file reported by S3 is the size we think we uploaded and waits if it isn't. In my testing, the delay to and from S3 is certainly variable -- sometimes it seems to be a few milliseconds but for a little while it was getting close to a second.

manthey commented 1 month ago

See https://github.com/girder/large_image/pull/1579

arjunrajlab commented 1 month ago

Awesome! So… it works-ish. If I upload a file, it still gets hung up at the same point:

image

Now, however, if I refresh the page, it goes through:

image

So I guess we need something on the front end to wait for girder/large_image? Will we need that in other parts of the interface, too?

arjunrajlab commented 1 month ago

As a related question: let's say a file grew "old" and S3 was configured to put that into Glacier. Then, I guess we would have to initiate a request for retrieval and then check in periodically to check that the file came back. Would that be mostly front end stuff, or can it be built into Girder for the most part? Just curious at this stage.

manthey commented 1 month ago

It depend on how AWS exposes the file. If the glacier storage is a different assetstore, then when the file is moved to glacier, we'd need some trigger to update the file record in Mongo to point to the glacier assetstore and have the appropriate key in that assetstore. If it is the same file, then the main issue with this is when you ask for a file in glacier, it can take hours before it is available, so on first query, it will appear broken and then only work after those hours have passed.

arjunrajlab commented 1 month ago

My impression is that it is the latter, meaning that the file looks to be there, but you need to wait for it to come back to actually get it. One option is to manually restore the files and just have something that warns the user "File in deep storage; request for it to be restored" or something like that.

arjunrajlab commented 1 month ago

@bruyeret I think we need to update the front end to wait a little to get the image back to address the above issue, thanks!