Closed rchan26 closed 5 months ago
@rchan26 all looks good but do we really need that option where you pass just the name of the file and if the file is not in the same directory checks if it's in input? Can't we simplify it by just allowing the correct path where the file is?
I mean this:
yeah, I was thinking and debating about this. it is a bit messier, but I was thinking about the case where your data folder isn't data
which is the case with the project. in that repo, there's already a data
folder including the evals, so I make a new folder pipeline_data
. in this case, if we only allowed for a valid path, we'd have to specify that different folder twice:
prompto_run_experiment -f pipeline_data/input/test.jsonl -d pipeline_data
in this setup, you could do
prompto_run_experiment -f test.jsonl -d pipeline_data
to run pipeline_data/input/test.jsonl
.
But I think I agree with you, it's a bit confusing, especially if for some reason you have test.jsonl
in the current directory and in the input folder (it would actually run the thing in the input folder). I'll remove this option and ask for another review
Fix #10. The filename must be the full path to the file,
Example usage:
if
test.jsonl
is in the current directory, it'll get moved to the input folder of the data folder, which isdata/input
by default:if
test.jsonl
is not in the current directory, you must provide the full path to it and it'll get moved to the input folder of the data folder if it isn't already. if it is in the data folder:if the filename passed is not a JSONL file, it will error:
the file can really be anywhere, it will just get moved to the input folder for us to run. you can specify the data folder using
-d
as usual, and there are flags for max-queries per minute (-m
) and max retry attempts (-a
) and whether or not to run the experiment in "parallel" (-p
) by querying different APIs in parallel - just like the standardprompto_run_pipeline
commandNote: the below is not an option anymore but keeping here as a record: