Open dtsuzuku-ibm opened 1 week ago
Thanks, @dtsuzuku-ibm. I am ok with most of the decisions you have made (like not including the information we can get from github about the Author and Date. However, @agoyal26, who created the template, may insist that we still provide such information in the README file. Since we use your README as a model for others, let me not approve it yet, until @agoyal26 has also reviewed this and we decide what is a "must" in the README file. Again, thanks for doing this before all other transform owners!
I have made some suggestions to the read me. We can skip adding date of revisions but I think adding the list of contributors with their email would be good.
Where are your suggestions, @agoyal26? (other than adding the list of contributors and their emails)
I added them in readme.md itself as comments- are they visible? @shahrokhDaijavad
They are probably on your local copy of the file, @agoyal26. I don't see them. Do you want to just copy and paste your local README file here?
Thanks, @agoyal26 Now, I see your comments!
@agoyal26
Should we not call it content then instead of description in table header ?
I'm ok with that. Should I change the name of description
column of the table in Output columns annotated by this transform
to content
, too? Both columns are for explaining what kind of data you'll see in each column listed in table.
oh my apologies - I get it now. Let it be description of the column. Also please add contributors and authors name and email id to readme. Then looks like we are good to go. Thank you
@shahrokhDaijavad Could you assign any autorized user as reviewer?
@dtsuzuku-ibm I think this is ready to merge. I will ask @touma-I to do this.
@shahrokhDaijavad I'm afraid I'm not authorized to merge this PR... Could you merge this?
@dtsuzuku-ibm Since we don't have the notebook example yet, @touma-I is suggesting adding these few lines of code as an example of how we will be using all transforms in the future using pip install and not cloning the repo. Once we have the notebook example, we may remove these lines and just refer to the notebook.
very good point @touma-I . I agree too- as we are promoting that DPK transforms are easy to use via pip install - we should include these steps.
@touma-I @shahrokhDaijavad @agoyal26
User can refer to src/doc_quality_local_python.py
. Is adding link enough?
I want to avoid copying&pasting duplicated codes.
@touma-I @shahrokhDaijavad @agoyal26 User can refer to
src/doc_quality_local_python.py
. Is adding link enough? I want to avoid copying&pasting duplicated codes.
Hi @dtsuzuku-ibm I understand the desire not to copy/paste and hopefully we will get to the point where we can automate some of this stuff maybe via CI/CD. But for now, it is not clear to me what is the minimum that the user needs to do to use this transform. I think having it explained in 5-6 lines of code will help get us to that point.
@dtsuzuku-ibm and @touma-I, I added a minimal version of the notebook for this transform based on what we have decided as a notebook template for all transforms. I tried to mimic what I learned from Maroun yesterday, but this being my first time, I may have made mistakes. Please check and modify as needed. After checking (testing), @dtsuzuku-ibm please link to this notebook from your README file and we will be ready to merge the PR.
@shahrokhDaijavad @touma-I I've added link to jupyter notebook. Could you review?
@dtsuzuku-ibm The link is fine. However, the output of the notebook you have run shows an error (ModuleNotFoundError: No module named 'doc_quality_transform'). I assume you were able to fix this error and run the notebook to the end, because the final output looks good. Am I right? It would be good to have a version of the notebook without output errors in the repo.
@shahrokhDaijavad I've updated notebook. It seems my latest local change was not synced.
following template https://github.com/IBM/data-prep-kit/issues/753#issuecomment-2460867526
What I didn't include
Why are these changes needed?
Related issue number (if any).
https://github.com/IBM/data-prep-kit/issues/753