AnthusAI / Plexus

An orchestration system for managing text classification at scale using LLMs, ML models, and NLP.
https://anthusai.github.io/Plexus/
MIT License
4 stars 0 forks source link

HuggingFaceDataCache for load, analyse and verify HuggingFace datasets. #10

Closed uokesita closed 3 weeks ago

uokesita commented 3 weeks ago

(With testing: mock and real)

This was done almost all by ChatGPT4o.

endymion commented 3 weeks ago

@osledybazo, this is really cool! I can't wait to play with this with some of those Huggingface datasets. I only made one slight tweak: I renamed and moved the test file to put it directly next to the file it's testing, like we did with RSpec. It feels better to me than trying to mirror a parallel folder structure in a separate test folder. And I like to put the _test last in the filename so that the file and the test file will be grouped together in file lists.

@dereknorrbom, there's one test in the new test for this that's commented out because it does an actual external API call to HuggingFace. I think you already have some kind of standard way to mark the tests that are not supposed to run during development builds because they don't mock external dependencies -- they really use them? Could we add that to the test that's commented out so that we have the option of using it in full integration tests?