IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
307 stars 134 forks source link

update readme #790

Open dtsuzuku-ibm opened 1 week ago

dtsuzuku-ibm commented 1 week ago

following template https://github.com/IBM/data-prep-kit/issues/753#issuecomment-2460867526

What I didn't include

  1. Header/Author, since we can see it in github
  2. Header/Date, since we can see it in github
  3. Changelog, since we may better consider a way to generate changelog from the commits as github action.
  4. Code Examples and Documentation, since we can provide it in jupyter notebook

Why are these changes needed?

Related issue number (if any).

https://github.com/IBM/data-prep-kit/issues/753

shahrokhDaijavad commented 1 week ago

Thanks, @dtsuzuku-ibm. I am ok with most of the decisions you have made (like not including the information we can get from github about the Author and Date. However, @agoyal26, who created the template, may insist that we still provide such information in the README file. Since we use your README as a model for others, let me not approve it yet, until @agoyal26 has also reviewed this and we decide what is a "must" in the README file. Again, thanks for doing this before all other transform owners!

agoyal26 commented 1 week ago

I have made some suggestions to the read me. We can skip adding date of revisions but I think adding the list of contributors with their email would be good.

shahrokhDaijavad commented 1 week ago

Where are your suggestions, @agoyal26? (other than adding the list of contributors and their emails)

agoyal26 commented 1 week ago

I added them in readme.md itself as comments- are they visible? @shahrokhDaijavad

shahrokhDaijavad commented 1 week ago

They are probably on your local copy of the file, @agoyal26. I don't see them. Do you want to just copy and paste your local README file here?

shahrokhDaijavad commented 1 week ago

Thanks, @agoyal26 Now, I see your comments!

dtsuzuku-ibm commented 1 week ago

@agoyal26

Should we not call it content then instead of description in table header ?

I'm ok with that. Should I change the name of description column of the table in Output columns annotated by this transform to content, too? Both columns are for explaining what kind of data you'll see in each column listed in table.

agoyal26 commented 1 week ago

oh my apologies - I get it now. Let it be description of the column. Also please add contributors and authors name and email id to readme. Then looks like we are good to go. Thank you

dtsuzuku-ibm commented 1 week ago

@shahrokhDaijavad Could you assign any autorized user as reviewer?

shahrokhDaijavad commented 1 week ago

@dtsuzuku-ibm I think this is ready to merge. I will ask @touma-I to do this.

dtsuzuku-ibm commented 1 week ago

@shahrokhDaijavad I'm afraid I'm not authorized to merge this PR... Could you merge this?

shahrokhDaijavad commented 1 week ago

@dtsuzuku-ibm Since we don't have the notebook example yet, @touma-I is suggesting adding these few lines of code as an example of how we will be using all transforms in the future using pip install and not cloning the repo. Once we have the notebook example, we may remove these lines and just refer to the notebook.

agoyal26 commented 1 week ago

very good point @touma-I . I agree too- as we are promoting that DPK transforms are easy to use via pip install - we should include these steps.

dtsuzuku-ibm commented 1 week ago

@touma-I @shahrokhDaijavad @agoyal26 User can refer to src/doc_quality_local_python.py. Is adding link enough? I want to avoid copying&pasting duplicated codes.

touma-I commented 1 week ago

@touma-I @shahrokhDaijavad @agoyal26 User can refer to src/doc_quality_local_python.py. Is adding link enough? I want to avoid copying&pasting duplicated codes.

Hi @dtsuzuku-ibm I understand the desire not to copy/paste and hopefully we will get to the point where we can automate some of this stuff maybe via CI/CD. But for now, it is not clear to me what is the minimum that the user needs to do to use this transform. I think having it explained in 5-6 lines of code will help get us to that point.

shahrokhDaijavad commented 19 hours ago

@dtsuzuku-ibm and @touma-I, I added a minimal version of the notebook for this transform based on what we have decided as a notebook template for all transforms. I tried to mimic what I learned from Maroun yesterday, but this being my first time, I may have made mistakes. Please check and modify as needed. After checking (testing), @dtsuzuku-ibm please link to this notebook from your README file and we will be ready to merge the PR.

dtsuzuku-ibm commented 14 hours ago

@shahrokhDaijavad @touma-I I've added link to jupyter notebook. Could you review?

shahrokhDaijavad commented 12 hours ago

@dtsuzuku-ibm The link is fine. However, the output of the notebook you have run shows an error (ModuleNotFoundError: No module named 'doc_quality_transform'). I assume you were able to fix this error and run the notebook to the end, because the final output looks good. Am I right? It would be good to have a version of the notebook without output errors in the repo.

dtsuzuku-ibm commented 9 hours ago

@shahrokhDaijavad I've updated notebook. It seems my latest local change was not synced.