CarperAI / Code-Pile

This repository contains all the code for collecting large scale amounts of code from GitHub.
MIT License
105 stars 29 forks source link

WikiBooks Dataset Processing #22

Closed PhungVanDuy closed 1 year ago

ncoop57 commented 1 year ago

Hey @PhungVanDuy looking good! One thing that will need to be changed though is to use our interfaces for code datasets. You can see an example of how this works with the stackexchange dataset: https://github.com/CarperAI/Code-Pile/blob/main/codepile/stackexchange/stackexchange.py

Essentially, we want to follow the same style for all our datasets so that the code is easier to manage and extend

PhungVanDuy commented 1 year ago

Hey @PhungVanDuy looking good! One thing that will need to be changed though is to use our interfaces for code datasets. You can see an example of how this works with the stackexchange dataset: https://github.com/CarperAI/Code-Pile/blob/main/codepile/stackexchange/stackexchange.py

Essentially, we want to follow the same style for all our datasets so that the code is easier to manage and extend

@ncoop57 I just have a quick refactor, please check.

reshinthadithyan commented 1 year ago

Hello, Daica. Thanks for the PR. PRs should go to the working branch not the main

PhungVanDuy commented 1 year ago

Hello, Daica. Thanks for the PR. PRs should go to the working branch not the main

Thank for your comment, just changed!

ncoop57 commented 1 year ago

@PhungVanDuy looking good! added some comments

ncoop57 commented 1 year ago

@PhungVanDuy added some comments. I think it is almost ready to merge once the unit test issues are resolved!

PhungVanDuy commented 1 year ago

@PhungVanDuy added some comments. I think it is almost ready to merge once the unit test issues are resolved!

I wrote the unit test here can you check it? @ncoop57

PhungVanDuy commented 1 year ago

@PhungVanDuy added some comments. I think it is almost ready to merge once the unit test issues are resolved!

I wrote the unit test here can you check it? @ncoop57

I just saw your example for the unit test in the main branch. Do you mean I should modify the test code to follow the class style?

ncoop57 commented 1 year ago

@PhungVanDuy it can follow the class style I created or this function style as shown in this pytest site: https://docs.pytest.org/en/7.1.x/getting-started.html

PhungVanDuy commented 1 year ago

@PhungVanDuy it can follow the class style I created or this function style as shown in this pytest site: https://docs.pytest.org/en/7.1.x/getting-started.html

I just changed the style for tests.py file, let me know if have any problems.

ncoop57 commented 1 year ago

@PhungVanDuy could you grant me access to make changes to this PR?

You can enable it by following this blog: https://github.blog/2016-09-07-improving-collaboration-with-forks/

ncoop57 commented 1 year ago

Okay, so fixed up the tests and pyproject.toml. Sadly I can't actually run the tests since they require S3 keys haha. @reshinthadithyan can you run the tests? I am not on the HPC. If not we can just go ahead and merge. It lgtm

PhungVanDuy commented 1 year ago

Okay, so fixed up the tests and pyproject.toml. Sadly I can't actually run the tests since they require S3 keys haha. @reshinthadithyan can you run the tests? I am not on the HPC. If not we can just go ahead and merge. It lgtm

Can we merge after today?, I need some edit readme for data processing.

ncoop57 commented 1 year ago

@PhungVanDuy I think it is ready to merge. That okay?

PhungVanDuy commented 1 year ago

@ncoop57 yes, please