SBU-BMI / wsinfer

🔥 🚀 Blazingly fast pipeline for patch-based classification in whole slide images
https://wsinfer.readthedocs.io
Apache License 2.0
55 stars 9 forks source link

changed GeoJSON format #179

Closed swaradgat19 closed 11 months ago

swaradgat19 commented 11 months ago

fixes #178

swaradgat19 commented 11 months ago

The test failed because of the assert condition that checks if the id is equal to PathTileObject. Putting in another commit to change that to 49069bbb-a0ac-4343-96fe-15af45c30634

swaradgat19 commented 11 months ago

Added paquo in pyproject.toml file

swaradgat19 commented 11 months ago

@kaczmarj Sure. I was confused about the id too. This makes sense. I will do the changes

kaczmarj commented 11 months ago

thanks @swaradgat19 . i made two commits to fix two things.

the geojson dict still needs to be be updated to the following format

"properties": {"isLocked": true, "measurements": [{"name": "prob_notumor", "value": 0.9998319149017334}, {"name": "prob_tumor", "value": 0.0001680454879533}]}
swaradgat19 commented 11 months ago

Yes. I'm updating the geojson dict

swaradgat19 commented 11 months ago

I believe this is the earlier format. In the issue, you have mentioned this format:

{"geometry": {"coordinates": [[[0, 23904], [350, 23904], [350, 24254], [0, 24254], [0, 23904]]], "type": "Polygon"}, "id": "49069bbb-a0ac-4343-96fe-15af45c30634", "properties": {"isLocked": true, "measurements": {"prob_notumor": 0.9998319149017334, "prob_tumor": 0.00016804548795334995}, "objectType": "tile"}, "type": "Feature"}

Measurements has prob_notumor and prob_tumor as keys, instead of name and value

kaczmarj commented 11 months ago

yes that's true, thanks. thanks for catching that

swaradgat19 commented 11 months ago

@kaczmarj I realised that there are many types of keys like prob_notumor, prob_Micropapillary, prob_Benign, etc. What I did is, I'm making the measurements dictionary like this :

def _row_to_geojson(row: pd.Series, prob_cols: list[str]) -> dict:
    """Convert information about one tile to a single GeoJSON feature."""
    minx, miny, width, height = row["minx"], row["miny"], row["width"], row["height"]
    coords = _box_to_polygon(minx=minx, miny=miny, width=width, height=height)
    prob_dict = row[prob_cols].to_dict()
    # measurements = [{"name": k, "value": v} for k, v in prob_dict.items()]

    measurements = [{f"{k}" : v } for k,v in prob_dict.items()]

This gave me the following measurements:

measurements --> [{'prob_Lepidic': 1.4467824257735629e-05}, {'prob_Benign': 0.9999827146530152}, {'prob_Acinar': 2.2495337361760903e-06}, {'prob_Micropapillary': 4.5913903790051336e-08}, {'prob_Mucinous': 6.052407570678042e-07}, {'prob_Solid': 4.3918024772438e-08}]

Is this what we need?

swaradgat19 commented 11 months ago

CMU-1.json looks like this: (Updated it with objectType attribute and single dictionary for measurements)

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": "3ff4765d-9029-4399-ae58-27d4c553f37a",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              2808,
              22815
            ],
            [
              2808,
              23166
            ],
            [
              2457,
              23166
            ],
            [
              2457,
              22815
            ],
            [
              2808,
              22815
            ]
          ]
        ]
      },
      "properties": {
        "isLocked": true,
        "measurements": {
          "prob_Lepidic": 0.0020260643213987,
          "prob_Benign": 0.9949127435684204,
          "prob_Acinar": 0.0004915686440654,
          "prob_Micropapillary": 0.0009807297028601,
          "prob_Mucinous": 0.0015227942494675,
          "prob_Solid": 0.00006609381671296433
        },
        "objectType": "tile"
      }
    },
kaczmarj commented 11 months ago

the measurements object should be a dictionary, not a list of dictionaries. something like this:

measurements = {k: v for k, v in prob_dict.items()}

and the properties dict should contain

"objectType": "tile"
swaradgat19 commented 11 months ago

the measurements object should be a dictionary, not a list of dictionaries. something like this:

measurements = {k: v for k, v in prob_dict.items()}

and the properties dict should contain

"objectType": "tile"

Yes, I just updated the previous comment with the same changes. Let me know if they're correct

kaczmarj commented 11 months ago

yep, that looks good!

swaradgat19 commented 11 months ago

I guess we'll have to change the test_cli_run_with_registered_models function as well. I'll put in another commit

swaradgat19 commented 11 months ago

@kaczmarj - I have changed the test in the following way:

    res = []
    for i, _ in enumerate(prob_cols):
        res.append(
            np.array(
                [dd["properties"]["measurements"][prob_cols[i]] for dd in d["features"]]
            )
        )

I'm running the tests in the test_all.py file, although some are still failing. (16 failing, 22 passing). Can you suggest a way to test this script? I'm unable to print anything to the terminal, even after adding the -s flag.

swaradgat19 commented 11 months ago

I solved the key error by modifying the test ( given above ). On pytest test_all.py -vv I get:

FAILED test_all.py::test_cli_run_with_registered_models[openslide-False-breast-tumor-resnet34.tcga-brca] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-False-lung-tumor-resnet34.tcga-luad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-False-pancancer-lymphocytes-inceptionv4.tcga] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-False-pancreas-tumor-preactresnet34.tcga-paad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-False-prostate-tumor-resnet34.tcga-prad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-True-breast-tumor-resnet34.tcga-brca] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-True-lung-tumor-resnet34.tcga-luad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-True-pancancer-lymphocytes-inceptionv4.tcga] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-True-pancreas-tumor-preactresnet34.tcga-paad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_registered_models[openslide-True-prostate-tumor-resnet34.tcga-prad] - assert 1 == 0
FAILED test_all.py::test_cli_run_with_local_model - assert 1 == 0
FAILED test_all.py::test_patch_cli[openslide-256-0.25] - assert 1 == 0
FAILED test_all.py::test_patch_cli[openslide-256-0.5] - assert 1 == 0
FAILED test_all.py::test_patch_cli[openslide-350-0.25] - assert 1 == 0
FAILED test_all.py::test_patch_cli[openslide-100-0.3] - assert 1 == 0
FAILED test_all.py::test_patch_cli[openslide-100-0.5] - assert 1 == 0

It seems that the error is with openslide, as the tiffslide tests are working.

E       assert 1 == 0
E        +  where 1 = <Result BackendNotAvailable("OpenSlide is not available. Please install the OpenSlide compiled library and the Python package 'openslide-python'. See https://openslide.org/ for more information.")>.exit_code

test_all.py:323: AssertionError

I have openslide installed as well

kaczmarj commented 11 months ago

ah i see.

if you try to run import openslide in python, you might get an error that libopenslide cannot be found.

i should update the tests to only run the backends if the backends are installed.

kaczmarj commented 11 months ago

also to speed up debugging, you can run an individual test like this:

python -m pytest tests/test_all.py::test_cli_run_with_registered_models[openslide-False-breast-tumor-resnet34.tcga-brca]
kaczmarj commented 11 months ago

@swaradgat19 - in #180 i made a change to skip certain tests if tiffslide or openslide isn't available. once that is merged into main, please merge your branch with main. and hopefully your locally failing tests will be fixed (they'll simply be skipped if openslide can't be used)

kaczmarj commented 11 months ago

@swaradgat19 - #180 is merged now. you can merge main into your branch and try running the tests locally.

when running the tests locally, i like using the pytest argument -v for verbose. each test will be on its own line. you can also run individual tests like this:

python -m pytest -v tests/test_all.py::test_cli_run_with_registered_models[openslide-False-breast-tumor-resnet34.tcga-brca]

there are things in [ ] brackets because this tests uses pytest parametrize and those are the arguments passed to that particular test.

swaradgat19 commented 11 months ago

@kaczmarj Got it. Thanks! I've put in a bunch of commits with changes. It seems to be passing all checks (except the styles one). Also do take a look at the paquo script at link. It is giving no errors, but I cant check since I do not have a linux machine where I can verify the results

kaczmarj commented 11 months ago

@swaradgat19 - i will test that paquo script. thanks.

i made a minor stylistic change to this pr. instead of using

for i, _ in enumerate(prob_cols):
    prob_cols[i]  # get the prob_col

i changed it to

for i, prob_col in enumerate(prob_cols):
    prob_col  # get the prob_col
swaradgat19 commented 11 months ago

Yes, that is more intuitive