Experiment run command - Githubissues

rchan26 commented 5 months ago

Fix #10. The filename must be the full path to the file,

Example usage:

if test.jsonl is in the current directory, it'll get moved to the input folder of the data folder, which isdata/input by default:
```
prompto_run_experiment -f test.jsonl
```
if test.jsonl is not in the current directory, you must provide the full path to it and it'll get moved to the input folder of the data folder if it isn't already. if it is in the data folder:
```
prompto_run_experiment -f data/input/test.jsonl
```
if the filename passed is not a JSONL file, it will error:
```
prompto_run_experiment -f test.txt
```
the file can really be anywhere, it will just get moved to the input folder for us to run. you can specify the data folder using -d as usual, and there are flags for max-queries per minute (-m) and max retry attempts (-a) and whether or not to run the experiment in "parallel" (-p) by querying different APIs in parallel - just like the standard prompto_run_pipeline command
```
prompto_run_experiment --file some_folder/some_sub_folder/some_exp.jsonl --data pipeline_data --max-queries 20 --max-attempts 3 --parallel
```

Note: the below is not an option anymore but keeping here as a record:

The filename must either:
path to the file, or

already in the input folder
Example usage
if test.jsonl is not in the current directory, it has to be in the data/input folder, i.e. the full path is data/input/test.jsonl but you just pass in the filename:
prompto_run_experiment -f test.jsonl
In this setting where the file is in the input folder, this is equivalent to just passing the full path, i.e.
prompto_run_experiment -f data/input/test.jsonl
In the setting where test.jsonl is neither in the current directory or the input folder, there will be an error as it cannot locate where the file is.

fedenanni commented 5 months ago

@rchan26 all looks good but do we really need that option where you pass just the name of the file and if the file is not in the same directory checks if it's in input? Can't we simplify it by just allowing the correct path where the file is?

fedenanni commented 5 months ago

I mean this:

rchan26 commented 5 months ago

yeah, I was thinking and debating about this. it is a bit messier, but I was thinking about the case where your data folder isn't data which is the case with the project. in that repo, there's already a data folder including the evals, so I make a new folder pipeline_data. in this case, if we only allowed for a valid path, we'd have to specify that different folder twice:

prompto_run_experiment -f pipeline_data/input/test.jsonl -d pipeline_data

in this setup, you could do

prompto_run_experiment -f test.jsonl -d pipeline_data

to run pipeline_data/input/test.jsonl.

But I think I agree with you, it's a bit confusing, especially if for some reason you have test.jsonl in the current directory and in the input folder (it would actually run the thing in the input folder). I'll remove this option and ask for another review

alan-turing-institute / prompto

Experiment run command #49

Example usage:

Example usage