Question: context.output.set() or context.output.upload() or load/store?

magland commented 1 year ago

@luiztauffer

I am wondering whether the context.output.set() should be changed to context.output.upload() in dendro processors.

Here's a tutorial example I was putting together

#!/usr/bin/env python3

import os
import json
from dendro.sdk import App, BaseModel, Field, ProcessorBase, InputFile, OutputFile

app = App(
    'letter_count',
    description="Example Dendro processing app for tutorial",
    app_image="",
    app_executable=os.path.abspath(__file__)
)

description = """
This is the a processor in the letter_count app. It counts the number of times a particular letter appears in a text file and produces and JSON file with the result.
"""

class LetterCountProcessorContext(BaseModel):
    input: InputFile = Field(description='Input text file')
    output: OutputFile = Field(description='Output JSON file')
    letter: str = Field(description='Letter to count')

class LetterCountProcessor(ProcessorBase):
    name = 'letter_count'
    label = 'Letter Count'
    description = description
    tags = ['tutorial']
    attributes = {'wip': True}

    @staticmethod
    def run(context: LetterCountProcessorContext):
        input_fname = 'input.txt'
        output_fname = 'output.json'
        context.input.download(input_fname)
        letter = context.letter

        with open(input_fname, 'r') as f:
            text = f.read()
        count = text.count(letter)

        output = {
            'count': count
        }
        with open(output_fname, 'w') as f:
            f.write(json.dumps(output))
        context.output.set(output_fname)

app.add_processor(LetterCountProcessor)

if __name__ == '__main__':
    app.run()

Note the context.input.download(input_fname) -- so it seems like the corresponding output command should be upload.

But we also want the option of having the input/output files be on the local file system. So perhaps download/upload might not be the right terms. Maybe we should instead have

context.input.load(input_fname)
...
context.output.store(output_fname)

What are your thoughts?

luiztauffer commented 1 year ago

@magland another option could be read/write, what do you think? Any of these work well, I think I'd order my (slight) preference as:

input.read / output.write
input.download / output.upload
input.load / output.store

magland commented 1 year ago

The confusing thing about write is

context.output.write('somefile.txt')

seems like it would mean writing to somefile.txt, whereas it would really be uploading the content of the file to the output bucket.

I think download/upload is the clearest when data are actually remote. But it's confusing when working with local files.

How about input.get / output.set?

luiztauffer commented 1 year ago

seems like it would mean writing to somefile.txt, whereas it would really be uploading the content of the file to the output bucket.

oh, right, in that case I agree we should avoid read/write. I slightly prefer download/upload, as get/set sounds more like attributes operations what about import / export?

magland commented 1 year ago

I think I'll go with download/upload

flatironinstitute / dendro-old

Question: context.output.set() or context.output.upload() or load/store? #43