Algorithms don't work across both dicom and luna scans

reubano commented 6 years ago

Overview

Currently, the classification algorithm is tested with a luna scan, and the segmentation and identification algorithms are tested with dicom scans. This indicates that there may be some incompatibilities between dicom and luna scans that make them not work with certain algorithms. I have confirmed this by adding new tests for the complementary scans and produced the following results:

algorithm	dicom	luna
classification	FAILS	PASSES
segmentation	PASSES	FAILS
identification	crashes	crashes

Expected Behavior

All algorithm tests should pass

Technical details

Below are the specific tracebacks:

test_classification.py

(alcf) ⚡ docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_classification.py
========================================== test session starts ==========================================
platform darwin -- Python 3.6.2, pytest-3.1.3, py-1.4.34, pluggy-0.4.0 -- /opt/local/bin/python
cachedir: .cache
rootdir: /Users/reubano/Documents/Projects/alcf/prediction, inifile:
collected 3 items

src/tests/test_classification.py::test_classify_predict_load PASSED
src/tests/test_classification.py::test_classify_dicom FAILED
src/tests/test_classification.py::test_classify_luna PASSED

=============================================== FAILURES ================================================
__________________________________________ test_classify_dicom __________________________________________

dicom_paths = ['/Users/reubano/Documents/Projects/alcf/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.4901573811602007...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']
nodule_locations = [{'x': 50, 'y': 50, 'z': 21}]
model_path = '/Users/reubano/Documents/Projects/alcf/prediction/src/algorithms/classify/assets/gtr123_model.ckpt'

    def test_classify_dicom(dicom_paths, nodule_locations, model_path):
>       predicted = trained_model.predict(dicom_paths, nodule_locations, model_path)

src/tests/test_classification.py:9:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/algorithms/classify/trained_model.py:40: in predict
    return gtr123_model.predict(dicom_path, centroids, model_path)
src/algorithms/classify/src/gtr123_model.py:264: in predict
    ct_array, meta = preprocess(*load_ct(ct_path))
src/preprocess/load_ct.py:96: in load_ct
    dicom_pattern = os.path.join(path, '*.dcm')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = ['/Users/reubano/Documents/Projects/alcf/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.4901573811602007...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']
p = ('*.dcm',)

    def join(a, *p):
        """Join two or more pathname components, inserting '/' as needed.
        If any component is an absolute path, all previous path components
        will be discarded.  An empty last part will result in a path that
        ends with a separator."""
>       a = os.fspath(a)
E       TypeError: expected str, bytes or os.PathLike object, not list

../../../../.virtualenvs/alcf/lib/python3.6/posixpath.py:78: TypeError
================================== 1 failed, 2 passed in 12.92 seconds ==================================

test_segmentation.py

(alcf) ⚡ docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_segmentation.py
========================================== test session starts ==========================================
platform darwin -- Python 3.6.2, pytest-3.1.3, py-1.4.34, pluggy-0.4.0 -- /opt/local/bin/python
cachedir: .cache
rootdir: /Users/reubano/Documents/Projects/alcf/prediction, inifile:
collected 5 items

src/tests/test_segmentation.py::test_correct_paths PASSED
src/tests/test_segmentation.py::test_segment_predict_load PASSED
src/tests/test_segmentation.py::test_segment_dicom PASSED
src/tests/test_segmentation.py::test_segment_luna FAILED
src/tests/test_segmentation.py::test_lung_segmentation SKIPPED
======================================== short test summary info ========================================
SKIP [1] src/tests/test_segmentation.py:38: Takes very long

=============================================== FAILURES ================================================
___________________________________________ test_segment_luna ___________________________________________

metaimage_path = '/Users/reubano/Documents/Projects/alcf/images/LUNA-0001/1.3.6.1.4.1.14519.5.2.1.6279.6001.102133688497886810253331438797/1.3.6.1.4.1.14519.5.2.1.6279.6001.102133688497886810253331438797.mhd'
luna_nodule = {'x': 0, 'y': 100, 'z': 556}

    def test_segment_luna(metaimage_path, luna_nodule):
>       predicted = predict(metaimage_path, [luna_nodule])

src/tests/test_segmentation.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/algorithms/segment/trained_model.py:58: in predict
    volumes = calculate_volume(segment_path, centroids)
src/algorithms/segment/trained_model.py:85: in calculate_volume
    labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

.0 = <list_iterator object at 0x13b1a4f28>

>   labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids]
E   IndexError: index 556 is out of bounds for axis 2 with size 128

src/algorithms/segment/trained_model.py:85: IndexError
============================ 1 failed, 3 passed, 1 skipped in 43.85 seconds =============================

test_identification.py

(alcf) ⚡ docker-compose -f local.yml run -e RUN_SLOW_TESTS=true prediction pytest -vrsk src/tests/test_identification.py::test_identify_dicom_001
========================================== test session starts ==========================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0 -- /usr/bin/python3.6
cachedir: .cache
rootdir: /app, inifile:
collected 1 item 

src/tests/test_identification.py::test_identify_dicom_001 

(alcf) ⚡ docker-compose -f local.yml run -e RUN_SLOW_TESTS=true prediction pytest -vrsk src/tests/test_identification.py::test_identify_dicom_003
========================================== test session starts ==========================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0 -- /usr/bin/python3.6
cachedir: .cache
rootdir: /app, inifile:
collected 1 item 

src/tests/test_identification.py::test_identify_dicom_003

(alcf) ⚡ docker-compose -f local.yml run -e RUN_SLOW_TESTS=true prediction pytest -vrsk src/tests/test_identification.py::test_identify_luna
========================================== test session starts ===========================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0 -- /usr/bin/python3.6
cachedir: .cache
rootdir: /app, inifile:
collected 1 item 

src/tests/test_identification.py::test_identify_luna

Steps to Reproduce

In order to replicate these results you must first apply the following patch file.

git checkout 8380847
git apply patch.txt

Acceptance criteria

[x] classification test(s) should pass for dicom scans
[x] segmentation test(s) should pass for luna scans
[ ] identification test(s) should pass for luna scans (CR #252 - out of memory)
[ ] identification test(s) should pass for dicom scans (reset timeout back to 15, CR #243 & #213 - out of memory)

NOTES

All PRs must follow the standard PR checklist
Full points will be awarded separately for each boxed issue
A PR may include fixes for one or more boxed issues
For the above boxed issues, a check means the issue has been solved and a ... means someone is working on a PR
Once you start working on a boxed issue, please indicate so in the comments below

WGierke commented 6 years ago

Once #237 is merged, I'd like to work on "segmentation test(s) should pass for luna scans".

WGierke commented 6 years ago

I took a quick glance at the identification problems. When running the tests, I get a RuntimeError: $ Torch: not enough memory: you tried to allocate 21GB. Buy new RAM! at /pytorch/torch/lib/TH/THGeneral.c:270. This basically means that our client / web server (depending on where the project will be deployed in the end) needs at least 21 GB of RAM. Is it realistic to assume that this will be the case? Otherwise we might need to crunch the model a bit ... PS: :+1: for providing tests that need to pass :)

reubano commented 6 years ago

This basically means that our client / web server (depending on where the project will be deployed in the end) needs at least 21 GB of RAM. Is it realistic to assume that this will be the case?

I'd be inclined to say 'no', but I suppose @isms or @pjbull would know for sure.

isms commented 6 years ago

Per design doc, the goal here is running app on premise on local computer and not worrying so much about optimizing to moves lots of packets across networks. We can assume a workstation with a reasonable amount of RAM.

For practical purposes we are running the demo on a machine without unlimited RAM so it's not boundless. But it's an AWS c5.large or something similar so RAM is not in short supply.

WGierke commented 6 years ago

@reubano could you do me a favor and check out my branch 229_fix_segmentation? All I did was to apply the patch for the segmentation tests and merge the current master. The outcome was that all segmentation tests passed (I also ran slow tests):

➜  concept-to-clinic git:(229_fix_segmentation) ✗ docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_segmentation.py
Starting base ... 
Starting base ... done
======================================== test session starts ========================================
platform linux -- Python 3.6.1, pytest-3.1.3, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python3.6
cachedir: .cache
rootdir: /app, inifile:
collected 7 items 

src/tests/test_segmentation.py::test_correct_paths PASSED
src/tests/test_segmentation.py::test_segment_predict_load PASSED
src/tests/test_segmentation.py::test_segment_dicom PASSED
src/tests/test_segmentation.py::test_segment_luna PASSED
src/tests/test_segmentation.py::test_nodule_segmentation PASSED
src/tests/test_segmentation.py::test_lung_segmentation PASSED
src/tests/test_segmentation.py::test_stop_timeout PASSED

==================================== 7 passed in 514.55 seconds =====================================

I think the issues behind the failing segmentation tests at the moment of opening this issue could have been resolved in #237 . Can you confirm?

lamby commented 6 years ago

merged in https://github.com/concept-to-clinic/concept-to-clinic/commit/40c1b41d379cb60b975f5f3671b2521d3b0b81d4 :)

reubano commented 6 years ago

Leaving open until all sub-issues are solved.

WGierke commented 6 years ago

@isms I can understand that RAM usage is not a hard constraint. However, this makes it impossible for me (and others with less than 32GB RAM) to debug (and improve as described in #130 ) the model. Do you see any possibility to address this problem?

reubano commented 6 years ago

For practical purposes we are running the demo on a machine without unlimited RAM so it's not boundless. But it's an AWS c5.large or something similar so RAM is not in short supply.

According to the c5 spec page, a c5.large only has 4gb of RAM, the c5.4xlarge however has 32gb. The c5.4xlarge prices at $0.680/hr. Alternatively, the r4.xlarge has less compute power (vs the c5.4xlarge), but a similar amount of ram, for $0.266/hr.

@WGierke looks like your best bet may be to try spinning up an ec2 instance.

isms commented 6 years ago

I can understand that RAM usage is not a hard constraint. However, this makes it impossible for me (and others with less than 32GB RAM) to debug (and improve as described in #130 ) the model. Do you see any possibility to address this problem?

@WGierke I know that @caseyfitz (on our team) is looking into the models, and we can potentially try to report out some accuracy statistics from the status quo model in case it informs a parallel approach that uses less memory. That being said, we don't know how long that would take so if you are blocked based on system requirements I'm not sure what to recommend other than using a cloud VM with higher stats like @reubano suggested.

WGierke commented 6 years ago

@isms Alright, thanks a lot for the insights and the feedback. I'll have a look at how I can solve this problem for me. Thanks!

Raab70 commented 6 years ago

My 2 cents as someone who is lurking interested in contributing is to lower the barrier to entry as much as possible. If AWS is the answer can we use terraform or cloudformation to create a template which makes it easy to spin up a dev instance or an instance which runs the test suite? And make sure the costs of any kind of template are well documented so people know what they're getting into ahead of time.

Also, I have experience in terraform and ECS if you want me to try to put something like that together.

isms commented 6 years ago

@Raab70 That sounds like quite a bit of "meta" work with the limited amount of time left in the competition! That being said if you think it would be helpful for you to do so, feel free to give it a try and report back.

drivendataorg / concept-to-clinic