Open reubano opened 6 years ago
Once #237 is merged, I'd like to work on "segmentation test(s) should pass for luna scans".
I took a quick glance at the identification problems. When running the tests, I get a RuntimeError: $ Torch: not enough memory: you tried to allocate 21GB. Buy new RAM! at /pytorch/torch/lib/TH/THGeneral.c:270
. This basically means that our client / web server (depending on where the project will be deployed in the end) needs at least 21 GB of RAM. Is it realistic to assume that this will be the case? Otherwise we might need to crunch the model a bit ...
PS: :+1: for providing tests that need to pass :)
This basically means that our client / web server (depending on where the project will be deployed in the end) needs at least 21 GB of RAM. Is it realistic to assume that this will be the case?
I'd be inclined to say 'no', but I suppose @isms or @pjbull would know for sure.
Per design doc, the goal here is running app on premise on local computer and not worrying so much about optimizing to moves lots of packets across networks. We can assume a workstation with a reasonable amount of RAM.
For practical purposes we are running the demo on a machine without unlimited RAM so it's not boundless. But it's an AWS c5.large
or something similar so RAM is not in short supply.
@reubano could you do me a favor and check out my branch 229_fix_segmentation? All I did was to apply the patch for the segmentation tests and merge the current master. The outcome was that all segmentation tests passed (I also ran slow tests):
➜ concept-to-clinic git:(229_fix_segmentation) ✗ docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_segmentation.py
Starting base ...
Starting base ... done
======================================== test session starts ========================================
platform linux -- Python 3.6.1, pytest-3.1.3, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python3.6
cachedir: .cache
rootdir: /app, inifile:
collected 7 items
src/tests/test_segmentation.py::test_correct_paths PASSED
src/tests/test_segmentation.py::test_segment_predict_load PASSED
src/tests/test_segmentation.py::test_segment_dicom PASSED
src/tests/test_segmentation.py::test_segment_luna PASSED
src/tests/test_segmentation.py::test_nodule_segmentation PASSED
src/tests/test_segmentation.py::test_lung_segmentation PASSED
src/tests/test_segmentation.py::test_stop_timeout PASSED
==================================== 7 passed in 514.55 seconds =====================================
I think the issues behind the failing segmentation tests at the moment of opening this issue could have been resolved in #237 . Can you confirm?
Leaving open until all sub-issues are solved.
@isms I can understand that RAM usage is not a hard constraint. However, this makes it impossible for me (and others with less than 32GB RAM) to debug (and improve as described in #130 ) the model. Do you see any possibility to address this problem?
For practical purposes we are running the demo on a machine without unlimited RAM so it's not boundless. But it's an AWS c5.large or something similar so RAM is not in short supply.
According to the c5 spec page, a c5.large
only has 4gb of RAM, the c5.4xlarge
however has 32gb. The c5.4xlarge
prices at $0.680/hr. Alternatively, the r4.xlarge
has less compute power (vs the c5.4xlarge
), but a similar amount of ram, for $0.266/hr.
@WGierke looks like your best bet may be to try spinning up an ec2 instance.
I can understand that RAM usage is not a hard constraint. However, this makes it impossible for me (and others with less than 32GB RAM) to debug (and improve as described in #130 ) the model. Do you see any possibility to address this problem?
@WGierke I know that @caseyfitz (on our team) is looking into the models, and we can potentially try to report out some accuracy statistics from the status quo model in case it informs a parallel approach that uses less memory. That being said, we don't know how long that would take so if you are blocked based on system requirements I'm not sure what to recommend other than using a cloud VM with higher stats like @reubano suggested.
@isms Alright, thanks a lot for the insights and the feedback. I'll have a look at how I can solve this problem for me. Thanks!
My 2 cents as someone who is lurking interested in contributing is to lower the barrier to entry as much as possible. If AWS is the answer can we use terraform or cloudformation to create a template which makes it easy to spin up a dev instance or an instance which runs the test suite? And make sure the costs of any kind of template are well documented so people know what they're getting into ahead of time.
Also, I have experience in terraform and ECS if you want me to try to put something like that together.
@Raab70 That sounds like quite a bit of "meta" work with the limited amount of time left in the competition! That being said if you think it would be helpful for you to do so, feel free to give it a try and report back.
Overview
Currently, the classification algorithm is tested with a luna scan, and the segmentation and identification algorithms are tested with dicom scans. This indicates that there may be some incompatibilities between dicom and luna scans that make them not work with certain algorithms. I have confirmed this by adding new tests for the complementary scans and produced the following results:
Expected Behavior
All algorithm tests should pass
Technical details
Below are the specific tracebacks:
test_classification.py
test_segmentation.py
test_identification.py
Steps to Reproduce
In order to replicate these results you must first apply the following patch file.
Acceptance criteria
NOTES
...
means someone is working on a PR