SBU-BMI / tumor-til-survival-analysis

Scripts to run analysis of tumor/TILs.
Apache License 2.0
2 stars 0 forks source link

error when using wsinfer outputs -- no matching files #17

Open kaczmarj opened 1 year ago

kaczmarj commented 1 year ago

hi @lthealy - i am running the tumor-til analysis pipeline on wsinfer outputs. i'm getting an error that "no predictions had exact pairs".

i have attached a tar file with a small dataset (one slide) to reproduce this error.

data.tar.gz

the dataset has the following folder structure:

data
├── samples.csv
├── tils
│   └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.csv
└── tumor
    └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.csv

here is the error:

[1] "TIL Algorithm (Threshold): Frontiers InceptionV4: 0.1"
[1] "=========== Params after R parsing, if any misalignment please check your flags ==========="
$algorithm
[1] "inceptionv4"

$tilDir
[1] "/data/results-tils"

$tilThresh
[1] 0.1

$cancDir

[1] "/data/results-tumor"

$cancThresh
[1] 0.5

$sampFile
[1] ""

$outputFile
[1] "output.csv"

$outputDir
[1] "/data/results-tilalign/"

$writePNG
[1] TRUE

$sampInfo
[1] "/data/sample_info.csv"

 . . . Dropping low_res and color- files . . . 
 . . . Checking for tumor/lymph pairs . . . 
 . . . All files have pairs . . . 
Error: No predictions had exact pairs. Please ensure lymph and cancer pairs have the exact same name.
Execution halted
lthealy commented 1 year ago

I think I see it. We have a grep call at line 76 in commandLineAlign.R that only keeps files that start with "prediction" in an effort to drop "color-" files. That returns no entries so everything "matches" because everything is nothing.

WSIinfer never has the prediction/color prefix, so I'll have a flag that if grep(prediction) returns 0 then don't run that trim. Sound good?

lthealy commented 1 year ago

Changes made for TIL and Canc sections, shown below for TIL only Old Code:

tils = tils[grep("^prediction", tils)]
writeLines(" . . . Dropping low_res and color- files . . . ")
if(any(grepl("low_res", tils))){
   tils = tils[-grep("low_res", tils)]
}

New Code:

if(length(grep("^prediction", tils))>0){ ## WSInfer outputs lack prefix, older outputs have prefix. 
   tils = tils[grep("^prediction", tils)]
}

writeLines(" . . . Dropping low_res and color- files . . . ")
if(any(grepl("low_res", tils))){
   tils = tils[-grep("low_res", tils)]
}
kaczmarj commented 1 year ago

is there a different path in the code to deal with wsinfer outputs? we would want to take that path if we detect that the files are from wsinfer. there's at least things we can test:

  1. like you say, a lack of prediction- prefix
  2. the presence of .csv suffixes
  3. the presence of a header in the CSV files

there should also be a message printed saying that it has found wsinfer outputs and will use those.

my only worry about assuming that we have wsinfer outputs if there are no files with prediction- prefixes is that if there are no files at all (or maybe the user passed a nested directory), then the error will be confusing.

lthealy commented 1 year ago

Yes that'll just require a little shuffling but should be just as straightforward. Currently WSInfer detection is managed after parsing (and really is just a csv suffix check). See lymphFormatCsv object for that detection

lthealy commented 1 year ago

Question @kaczmarj, does WSInfer spit any log files into the output directory? Something we would have to drop on a glob before running? I dont think so, but wanted to make sure

kaczmarj commented 1 year ago

Yes, it creates several directories. model-outputs, stitches, patches, and a json file with runtime info.Best,JakubOn Apr 14, 2023, at 12:55, Luke Torre-Healy @.***> wrote: Question @kaczmarj, does WSInfer spit any log files into the output directory? Something we would have to drop on a glob before running? I dont think so, but wanted to make sure

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

kaczmarj commented 1 year ago

here is a tree of wsinfer outputs. keep in mind that run_metadata_20230225T122426.json includes a timestamp so the actual name will differ across runs.

results-wsinfer
├── masks
│   ├── TCGA-3L-AA1B-01Z-00-DX1.jpg
│   ├── TCGA-4N-A93T-01Z-00-DX1.jpg
│   ├── TCGA-4T-AA8H-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT4-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT5-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT6-01Z-00-DX1.jpg
│   ├── TCGA-5M-AATE-01Z-00-DX1.jpg
│   ├── TCGA-A6-2671-01Z-00-DX1.jpg
│   ├── TCGA-A6-2672-01Z-00-DX1.jpg
│   └── TCGA-A6-2674-01Z-00-DX1.jpg
├── model-outputs
│   ├── TCGA-3L-AA1B-01Z-00-DX1.csv
│   ├── TCGA-4N-A93T-01Z-00-DX1.csv
│   ├── TCGA-4T-AA8H-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT4-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT5-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT6-01Z-00-DX1.csv
│   ├── TCGA-5M-AATE-01Z-00-DX1.csv
│   ├── TCGA-A6-2671-01Z-00-DX1.csv
│   ├── TCGA-A6-2672-01Z-00-DX1.csv
│   └── TCGA-A6-2674-01Z-00-DX1.csv
├── patches
│   ├── TCGA-3L-AA1B-01Z-00-DX1.h5
│   ├── TCGA-4N-A93T-01Z-00-DX1.h5
│   ├── TCGA-4T-AA8H-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT4-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT5-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT6-01Z-00-DX1.h5
│   ├── TCGA-5M-AATE-01Z-00-DX1.h5
│   ├── TCGA-A6-2671-01Z-00-DX1.h5
│   ├── TCGA-A6-2672-01Z-00-DX1.h5
│   └── TCGA-A6-2674-01Z-00-DX1.h5
├── process_list_autogen.csv
├── run_metadata_20230225T122426.json
└── stitches
    ├── TCGA-3L-AA1B-01Z-00-DX1.jpg
    ├── TCGA-4N-A93T-01Z-00-DX1.jpg
    ├── TCGA-4T-AA8H-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT4-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT5-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT6-01Z-00-DX1.jpg
    ├── TCGA-5M-AATE-01Z-00-DX1.jpg
    ├── TCGA-A6-2671-01Z-00-DX1.jpg
    ├── TCGA-A6-2672-01Z-00-DX1.jpg
    └── TCGA-A6-2674-01Z-00-DX1.jpg