Closed rj678 closed 2 years ago
Thank you for creating this pull request and the cool TODO list,
Just already a comment. I think it would be nice to add this feature in its own dedicated accessories folder. The accessories folder contains code for external software/packages etc to keep everything well organized. I think it would be great to have an s3 folder in accessories where we can put all your work,
Best wishes, Mart
Hi Mart, I've separated out the code to read images and annotations into the accessories folder (separate folders for images and annotations within accessories) - I'll clean up the code, but thought I'd get your feedback and check if this is along the lines of what you recommended, thanks
This already looks pretty cool!
I have some comments though. I don't think we need to change the WholeSlideAnnotation class by introducing a storage_source. The only problem is that the path does not exists. But i think we can solve this, if we check what kind of parser we give. For example, we can create a class CloudAnnotationParser and check if the given parser is of that type and then we check if the path exists in another way using the request lib maybe. We can make a class method for that in the CloudAnnotationParser Also, I think it would be nice if we make an S3AnnotationParser, for now we can start with an S3AsapAnnotationParser that inherits from CloudAnnotationParser and AsapAnnotationParser. We have to change the AsapAnnotationParser a bit. I think we can make a method _open_annotation, that opens the annotations, For the S3AsapAnnotationParser, we then only need to override this method with the code that you already made. What do you think about this?
I can also add some code if you like. You can allow me to do so as described here. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork
Either way is fine for me :).
Please let me know if you have any questions.
Thanks so much for the detailed feedback, Mart - I'll follow up on your feedback. you should be able to commit code to this PR's branch - it'd be amazing if you could add some code - looking forward to collaborating on this!
Hi Mart,
I've pushed the following changes:
create CloudAnnotationParser
with class method that checks if the s3 file exists
create method _open_annotation
in the existing asap parser
create and register S3AsapAnnotationParser
that inherits from CloudAnnotationParser
and AsapAnnotationParser
and overrides the _open_annotation
method
I've tested the changes - the code can be cleaned up, but it'd be great to get your feedback on whether this is along the lines of what you expected, thank you
Dear Rishi,
Thank you so much for your work!
I made some small changes and pushed them. Please check them and if you agree I think this feature can go into the main branch. We might reduce redundancy between the OpenSlide backend and the TiffSlide backend, but we can look a this later. I think for now this is fine.
Hi Mart, thanks so much for the commits - please let me know if you'd like any further changes for now - I plan to use this to train a model by reading data directly from s3 next, thanks so much again!
Dear Rishi,
I think this is a great contribution. I will merge this pull request. I tested your notebook locally, and the loading of images from s3 took quite some time. Please note if you train locally with the new image backend, it is pretty slow, and I highly recommend only using the data directly from AWS if you use something like SageMaker. Furthermore, I noticed that on google colab, you used python3.7. Please also note that the batch iterator requires at least python3.8 because it uses shared memory to speed up the loading of the data.
I will merge your pull request now. Thanks again for your contribution!
thanks so much - I'll keep these points in mind - looking forward to using this library some more!
Draft PR to add functionality to read images directly from the s3 bucket.
Features:
asap
parserAdditional Libraries required:
TODOs based on initial feedback: