Open vGsteiger opened 1 year ago
The workflow of using Pushshift data with a text classifier is the following:
This breaks down in the components:
I don't see any modifications needed for this integration, just adding the custom classes for Dataset & Trainer and configuring the DataLoader correctly should do it. I will document if I find anything in the base components that needs adapting.
Hi @roxanastiuca !
Thanks for your evaluation, could you still add an Issue(s) for the custom classes for Dataset and Trainer? Just to keep it nicely packaged up and to keep a place around to comment/note things down.
Thanks!
Progress:
Next up:
Also, if Pushshift is down, we can use the CVS data source with data I already queried and saved.
@roxanastiuca what is the progress here? Did you want to create separate issues similar to Ambarish? For example #86 #87 #88
It would be great to have a rundown with individual tasks like Ambarish did, indeed!
@MaxiBoether I think we might as well close this issue as well? I think Olga is tracking this in new PRs? And this was meant as a meta issue to create the correct issues which was never really done?
Is Olga tracking this in new issues? #101 is set to solve this issue here, right?
@OlgaOvcharenko If you're already tracking this please feel free to close this issue.
To better plan and have an overview of the tasks still open to implementing the Pushshift Reddit benchmark I ask you to go through the following functions with regard to the Pushshift Reddit benchmark integration into our testing infrastructure:
I know that without the underlying architecture fully implemented that these issues will rather be high level but we can refine them later on without issues, it's just good if you are already thinking about how your contribution fits with the architecture.
If you have any questions please feel free to contact me :)