PathologyDataScience / BCSS

Use this to download all elements of the BCSS dataset described in: Amgad M, Elfandy H, ..., Gutman DA, Cooper LAD. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics. 2019. doi: 10.1093/bioinformatics/btz083
MIT License
140 stars 16 forks source link

502 Bad Gateway #17

Closed player1321 closed 3 years ago

player1321 commented 3 years ago

Hello,

I ran python download_crowdsource_dataset.py but got 502 Bad Gateway:

girder_client.HttpError: HTTP error 502: POST https://demo.kitware.com/histomicstk/api/v1/api_key/token?key=n0Kp1ez8YOnOiWNoACryzeBlIzbUDW3iOD2DmPLI
Response text: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.0 (Ubuntu)</center>
</body>
</html>

Is the data not available now?

mostafajahanifar commented 3 years ago

Hi, I have the same issue here. It seems that the histomicstk is not working at the moment. But it's odd that it has been down at least for three days now. @kheffah can you please confirm if this is the issue? Thanks,

kheffah commented 3 years ago

Dear @player1321 and @mostafajahanifar Thank you for raising this issue. It seems the Kitware server has been down a few times lately. To avoid disruptions, now the dataset can also be downloaded directly at 0.25 MPP, using this direct link.

mostafajahanifar commented 3 years ago

Dear @player1321 and @mostafajahanifar Thank you for raising this issue. It seems the Kitware server has been down a few times lately. To avoid disruptions, now the dataset can also be downloaded directly at 0.25 MPP, using this direct link.

That is also color normalized, Wonderful! Thanks for the prompt reply Mohamed.

mostafajahanifar commented 3 years ago

Sorry @kheffah, I understand you have closed this issue, but can you kindly provide the information on the train/validation/test splits as well.

kheffah commented 3 years ago

@mostafajahanifar You are more than welcome. I like to separate training and testing sets by hospital for a better reflection of the external generalization of the model. The train/test split used for the model in our paper is discussed here. Recently, I've switched to internal-external cross-validation, where the hospitals that constitute the testing set are switched around to provide some variance around the accuracy metric -- for example, see how we split the train-test sets for the NuCLS paper here. Note that some hospitals have more slides than others. In my recent projects, I make sure each fold has at least one "big" hospital, having at least 9 slides. Note that the slide name encodes the hospital name, so the slide TCGA-E2-A14X-DX1, for example, comes from the hospital E2.

I hope this answers your question. Let me know if you need any clarifications.

mostafajahanifar commented 3 years ago

@mostafajahanifar You are more than welcome. I like to separate training and testing sets by hospital for a better reflection of the external generalization of the model. The train/test split used for the model in our paper is discussed here. Recently, I've switched to internal-external cross-validation, where the hospitals that constitute the testing set are switched around to provide some variance around the accuracy metric -- for example, see how we split the train-test sets for the NuCLS paper here. Note that some hospitals have more slides than others. In my recent projects, I make sure each fold has at least one "big" hospital, having at least 9 slides. Note that the slide name encodes the hospital name, so the slide TCGA-E2-A14X-DX1, for example, comes from the hospital E2.

I hope this answers your question. Let me know if you need any clarifications.

Thank you very much @kheffah for your detailed explanations. I cannot agree more with the new internal-external cross-validation scheme that you are taking. However, for this particular purpose I want to compare my model performance with your baseline for which you directed me to the related information. Again, I appreciate your help.

kheffah commented 3 years ago

@mostafajahanifar You're more than welcome. Let me know if you need anything else.

mostafajahanifar commented 3 years ago

@player1321 and @kheffah , just to let you know, I have contacted the Kitware people and they have fixed the problem with the website. So, I guess it's problem-free to use the code for dataset extraction now.

kheffah commented 3 years ago

@mostafajahanifar Thank you! That was very nice of you.