DDMAL / Rodan

:dragon_face: A web-based workflow engine.
https://rodan2.simssa.ca/
47 stars 13 forks source link

jobs with multiple output files do not have proper naming with batch inputs #1190

Open kyrieb-ekat opened 3 months ago

kyrieb-ekat commented 3 months ago

When you're processing a lot of images through the classifying job, the layers it separates and outputs won't be renamed "job name - Layer X" etc, as it does when you process single images. This makes sifting through and renaming relevant layers somewhat problematic if you need to search and use specific layers for another job, etc. Examples attached below (Image 1, multiple images with layers separated, image 2 a single image with layers separated and successfully renamed).

Screenshot 2024-07-15 at 9 51 21 AM Screenshot 2024-07-15 at 9 51 50 AM

homework36 commented 3 months ago

What do you think should be the expected naming practice for this?

kyrieb-ekat commented 3 months ago

Probably would just need to do what it does for the single upload (the BaRa3 example), where it just puts the file name at the front of "Fast Pixelwise... - Layer X", unless you think abridging it might be more helpful/useful.

homework36 commented 3 months ago

Okay. I'll check and fix it. It's not a problem within classifying. All jobs have this same behavior.

homework36 commented 1 month ago

Some notes for myself: File paths and naming are handled by Django upload_to which should work fine and we do not do anything else manually in terms of naming or renaming output files. Looks like when there're multiple outputs for one inputs, outputs[opt_name]["is_list"] is still False (opt_name is the job name according to logger msg). I'm not sure but it seems to me that it should be a list when there're multiple outputs. There're a few lines in RodanTaskType __init__ that seems to do something related but we never actually get into the else block of the if attrs.get("_abstract") is True statement. Also looking at the output.py in model (Django) now.

homework36 commented 1 month ago

I believe naming of output files is handled here https://github.com/DDMAL/Rodan/blob/e5f620dcfac55721a858ddbec81d85f73bc22dbe/rodan-main/code/rodan/jobs/core.py#L247

However, I cannot print out file names etc using logger.info for this with the following setting

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("rodan")

I did get it working when it is placed right below the class but not from the if statement for mode 0 and 1. According to this part of code, output file names should always contain job name and output port name. I wonder if it's a problem with how we handle multiple input resources for the same input port and some info gets lost.

homework36 commented 1 month ago

Just took another look at package_results. The two packaging modes work as follows: Mode 0:

tmp_dir/
  Job1 - OutputPortA/
    Resource1.txt
    Resource2.txt
  Job2 - OutputPortB/
    ResourceList1/
      00.jpg
      01.jpg

Mode 1:

tmp_dir/
  Resource1/
    Job1 - OutputPortA.txt
  Resource2/
    Job2 - OutputPortB.txt
  ResourceList1/
    Job2 - OutputPortB.list/
      00.jpg
      01.jpg

However, none of the directory structures are reflected on the resources page. We need to check (1) is this is really used by Rodan and (2) how is this show up with Rodan client.

homework36 commented 1 month ago

I was able to find out how to modify names for output resources. However, I discovered that our resource names are managed poorly. Sometimes the output resource is named after the job name, and sometimes it uses the same name as the input resource specified by the user. I thought about this and realize that it is indeed a tricky thing to do, because we can easily run into the following scenarios:

  1. single input, single output
  2. single input, multiple outputs
  3. multiple inputs, single output
  4. multiple inputs, multiple outputs

And with batch feeding inputs, it can be further complicated. While I had a quick discussion with @kyrieb-ekat what we should do, I found #1001 which mentions the job Labeler. I tested with a simple case and it turns out very useful in this case. This means, for example, for a classifying job, we can add a label "layer X" to the desired output port, and then we get all output resources with the original file name but "layer X" label. Currently we can even filter with the label and then do batch downloading. There was a followup issue #1127 for Labeler but I wasn't able to reproduce this on my local server. When I ran my tests, it works as expected. Maybe we should work on this and/or make a job like Renamer in a similar fashion?

homework36 commented 6 days ago

See #1221.

Users should consider the file formatting themselves. For example, "custom prefix - " will lead to "custom prefix - old name" while "custom prefix" will become"custom prefixold name".