Closed Anindyadeep closed 8 months ago
Hi @aniketmaurya , can I please know the status of this PR, I see it is getting in hold for quite some days.
Hi @aniketmaurya , can I please know the status of this PR, I see it is getting in hold for quite some days.
hi @Anindyadeep, thanks for the PR. sorry for late response, I will review it this week!
Hi @aniketmaurya , can I please know the status of this PR, I see it is getting in hold for quite some days.
hi @Anindyadeep, thanks for the PR. sorry for late response, I will review it this week!
Thanks
@aniketmaurya Seems like tests are failing, this means should I need to change the requirements file or do the operations without pandas?
Hi @aniketmaurya, I am cool with it you can take it and finish it. Thanks for reviewing
LGTM. I added a test and made some changes to match the other prepare files.
Can this be added to tutorials/prepare_dataset.md? cc @rasbt
I can take this up additionally for the tutorial if you guys are okay with it. cc: @aniketmaurya @carmocca
I can take this up additionally for the tutorial if you guys are okay with it. cc: @aniketmaurya @carmocca
Sounds great, @Anindyadeep !
I think we should include this new CSV approach above the "custom script" approach since this is potentially easier for some people. I modified the tutorial file accordingly and left a "TODO" section for you to fill in. Thanks for the great contribution in this PR here!
Also, if you have a draft, please let me know, I am happy to help and can then go over it as well.
Thanks @rasbt, I created the documentation, hope this should be good to go.
Just tested the script and it works great. The only nit I have is, shouldn't it be
python scripts/prepare_csv.py --csv_path test_data.csv
instead of
python scripts/prepare_csv.py test_data.csv
for consistency, since we use --arg
s in all other scripts?
Any thoughts @carmocca ?
Just tested the script and it works great. The only nit I have is, shouldn't it be
python scripts/prepare_csv.py --csv_path test_data.csv
Yeah, I agree with this.
Since it is a positional argument, I was not able to provide consistency, however, that can be changed if it is okay.
Sorry, I actually meant changing it in the script itself
Sorry, I actually meant changing it in the script itself
Please correct me if we are on the same page or not. So I was talking about changing the script so that we can have --csv_path
argument introduced 😅
Oh, sorry for causing any confusion. So
So I was talking about changing the script so that we can have --csv_path argument introduced 😅
was what I had originally in mind. But then my brain thought you were thinking I meant the documentation (since I just reviewed that and provided feedback in the previous round of comments).
Long story short, what I have in mind is that it'd be nice to have the --csv_path
in both the script (and then in the documentation).
I think the problem of not having the --csv_path
is that it would show up when someone uses prepare_csv.py --help
to figure out the usage.
Maybe the CLI class doesn't list it there because it's a positional argument. In this case, maybe we can set it to an empty string or None and raise an ValueError if it's not specified. Not sure.
Any thoughts there?
(CC @carmocca who is on top of all the stylistic choices in Lit-GPT)
Oh, sorry for causing any confusion. So
So I was talking about changing the script so that we can have --csv_path argument introduced 😅
was what I had originally in mind. But then my brain thought you were thinking I meant the documentation (since I just reviewed that and provided feedback in the previous round of comments).
Long story short, what I have in mind is that it'd be nice to have the
--csv_path
in both the script (and then in the documentation).I think the problem of not having the
--csv_path
is that it would show up when someone usesprepare_csv.py --help
to figure out the usage.Maybe the CLI class doesn't list it there because it's a positional argument. In this case, maybe we can set it to an empty string or None and raise an ValueError if it's not specified. Not sure.
Any thoughts there?
(CC @carmocca who is on top of all the stylistic choices in Lit-GPT)
Yeah, I have a question, why we are using jsonargparse
instead of fire , Which probably can handle positional arguments as the optional one.
Thanks, @carmocca, I have committed the suggestions and guess we are good to go.
Thanks, @aniketmaurya @rasbt @carmocca for the merge. It has been incredible. I am excited to contribute more.
Thanks for the awesome contribution @Anindyadeep. This is super valuable! (PS: I am currently working on an article on datasets and be excited to mention your awesome PR) Thanks again!
Thanks for the awesome contribution @Anindyadeep. This is super valuable! (PS: I am currently working on an article on datasets and be excited to mention your awesome PR) Thanks again!
Thanks again and it would be awesome to get mentioned. PS: The LoRA Article is one of my best reads from your amazing set of articles. Excited to read the upcoming article on datasets.
This commit aims to add a simple script to prepare a dataset from csv An assumption that the script make is the csv must contain these three columns: "instruction", "input", "output" (all case sensitive)
Fixes #329