flatironinstitute / dendro-old

Analyze neuroscience data in the cloud
https://flatironinstitute.github.io/dendro-docs/
Apache License 2.0
19 stars 2 forks source link

Question: context.output.set() or context.output.upload() or load/store? #43

Closed magland closed 1 year ago

magland commented 1 year ago

@luiztauffer

I am wondering whether the context.output.set() should be changed to context.output.upload() in dendro processors.

Here's a tutorial example I was putting together

#!/usr/bin/env python3

import os
import json
from dendro.sdk import App, BaseModel, Field, ProcessorBase, InputFile, OutputFile

app = App(
    'letter_count',
    description="Example Dendro processing app for tutorial",
    app_image="",
    app_executable=os.path.abspath(__file__)
)

description = """
This is the a processor in the letter_count app. It counts the number of times a particular letter appears in a text file and produces and JSON file with the result.
"""

class LetterCountProcessorContext(BaseModel):
    input: InputFile = Field(description='Input text file')
    output: OutputFile = Field(description='Output JSON file')
    letter: str = Field(description='Letter to count')

class LetterCountProcessor(ProcessorBase):
    name = 'letter_count'
    label = 'Letter Count'
    description = description
    tags = ['tutorial']
    attributes = {'wip': True}

    @staticmethod
    def run(context: LetterCountProcessorContext):
        input_fname = 'input.txt'
        output_fname = 'output.json'
        context.input.download(input_fname)
        letter = context.letter

        with open(input_fname, 'r') as f:
            text = f.read()
        count = text.count(letter)

        output = {
            'count': count
        }
        with open(output_fname, 'w') as f:
            f.write(json.dumps(output))
        context.output.set(output_fname)

app.add_processor(LetterCountProcessor)

if __name__ == '__main__':
    app.run()

Note the context.input.download(input_fname) -- so it seems like the corresponding output command should be upload.

But we also want the option of having the input/output files be on the local file system. So perhaps download/upload might not be the right terms. Maybe we should instead have

context.input.load(input_fname)
...
context.output.store(output_fname)

What are your thoughts?

luiztauffer commented 1 year ago

@magland another option could be read/write, what do you think? Any of these work well, I think I'd order my (slight) preference as:

magland commented 1 year ago

The confusing thing about write is

context.output.write('somefile.txt')

seems like it would mean writing to somefile.txt, whereas it would really be uploading the content of the file to the output bucket.

I think download/upload is the clearest when data are actually remote. But it's confusing when working with local files.

How about input.get / output.set?

luiztauffer commented 1 year ago

seems like it would mean writing to somefile.txt, whereas it would really be uploading the content of the file to the output bucket.

oh, right, in that case I agree we should avoid read/write. I slightly prefer download/upload, as get/set sounds more like attributes operations what about import / export?

magland commented 1 year ago

I think I'll go with download/upload