Generating geojson files on run command

SBU-BMI / wsinfer

🔥 🚀 Blazingly fast pipeline for patch-based classification in whole slide images

https://wsinfer.readthedocs.io

Apache License 2.0

55 stars 9 forks source link

Generating geojson files on run command #184

Closed swaradgat19 closed 10 months ago

swaradgat19 commented 11 months ago

fixes #181

swaradgat19 commented 11 months ago

I'm currently updating the tests. Will push them as and when I resolve them

swaradgat19 commented 11 months ago

@kaczmarj Since we've updated the result directories (model-outputs-csv/geojson), we will have to update the tests such that we assert that csv and json files be stored in tmp_path/model-outputs-csv and tmp_path/model-outputs-geojson directories right? Just trying to get an intuition so that I can modify the tests accordingly.

swaradgat19 commented 11 months ago

Updated the tests. All are passing except one. In the test test_issue_97, when we are running the command again using runner.invoke, it fails because the output directory already exists (for geojson). Perhaps we can let it generate the resulting geojson directory again? Or should I handle it in the test itself?

kaczmarj commented 11 months ago

Since we've updated the result directories (model-outputs-csv/geojson), we will have to update the tests such that we assert that csv and json files be stored in tmp_path/model-outputs-csv and tmp_path/model-outputs-geojson directories right?

Yes that is correct.

it fails because the output directory already exists (for geojson).

i don't see this error in the github actions logs. what is the traceback?

swaradgat19 commented 11 months ago

It was getting raised because we are checking whether the output directory exists or not. If it was, we were raising the FileExistsError (instead of the Click.Exceptions I believe).

def parallelize_geojson(csvs: list, results_dir: Path, num_workers: int) -> None:
    output = results_dir / "model-outputs-geojson"

    if not results_dir.exists():
        raise FileExistsError(f"results_dir does not exist: {results_dir}")
    if output.exists():
        # raise FileExistsError("Output directory already exists.")
        shutil.rmtree(f"{output}")
# rest of the code

To handle that, I'm just deleting the directory if it already exists(using shutil) and then it is getting created again below. I'm doing this so that the test passes, although we would want to change this.

kaczmarj commented 11 months ago

i see. so what we do for model outputs typically is skip any slides that already have model ouptut CSVs that exist. we should implement the same behavior for the geojson conversion.

so in the list of csvs to be converted, we should remove any that already exist as geojson. so existing geojsons will not be touched.

swaradgat19 commented 11 months ago

Got it. I'll make the changes

swaradgat19 commented 11 months ago

@kaczmarj Not entirely sure why the pytorch-nightly test is failing. Might be an issue with slide_path perhaps?

kaczmarj commented 11 months ago

i think there are two issues.

the style tests are failing. to fix that, run isort and black on the code to format the code.
to fix the pytorch nightly test, i think we need to check that a certain variable is not None.

https://github.com/SBU-BMI/wsinfer/blob/b05b7ee29e8482d2866c8e076a6f417fc7eda02a/wsinfer/wsi.py#L225

add the following between lines 225 and 226

if page0 is None:
    raise CannotReadSpacing()

swaradgat19 commented 11 months ago

Tried a try-except too. Didn't work

kaczmarj commented 11 months ago

i will take a look at this. it could be that something in the tifffile has changed slightly

kaczmarj commented 10 months ago

i'll review this pr soon. in the meantime, can you please merge the main branch into your branch? i made a few fixes in #188 . you will also have to resolve a merge conflict with wsinfer/wsi.py.

swaradgat19 commented 10 months ago

i left a few change requests. thanks for working on this @swaradgat19

Sure @kaczmarj ! I was actually trying to merge the main into my main branch, but ran into issues ( Github isn't allowing me to sync my forked repo because I'm 1 commit behind and 13 commits ahead of SBU-BMI/wsinfer). I've created a new branch fix/geojson_command with #188 included. Should I open a new PR with that branch?

kaczmarj commented 10 months ago

that's fine, let's continue the discussion in #191 . in the future, you can fix this sort of "merge conflict" on the command line. here are some docs that should help https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line

closing because this is replaced by #191