Closed tarekziade closed 5 months ago
I think it's fixed now.
There are a few remaining failures when I run poetry run task test
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_two_hf_models - assert False
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_student_torch_model - hydra.errors.InstantiationException: Error in call to target 'transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained':
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_teacher_torch_model - hydra.errors.InstantiationException: Error in call to target 'transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained':
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistantParallel::test_data - FileNotFoundError: Local file /Users/tarekziade/Dev/bert-squeeze/bert_squeeze/data/local_datasets/parallel/train.json doesn't exist
FAILED tests/assistants/test_train_assistant.py::TestTrainAssistant::test_fastbert_assistant - hydra.errors.InstantiationException: Error in call to target 'bert_squeeze.models.lt_fastbert.LtFastBert':
For the last one see my remark here https://github.com/JulesBelveze/bert-squeeze/pull/55/commits/0ca60a6b5dfe495a3591aee9dc33ac2509cff7ca
For *** FileNotFoundError: Local file /Users/tarekziade/Dev/bert-squeeze/bert_squeeze/data/local_datasets/parallel/train.json doesn't exist
it's happening with:
distil_assistant = DistilAssistant(
"distil-parallel",
teacher_kwargs={
"_target_": "tests.fixtures.dummy_models.Lr",
"checkpoints": "../tests/fixtures/resources/lr_dummy.bin",
},
student_kwargs={
"_target_": "tests.fixtures.dummy_models.Lr",
},
data_kwargs={
"path": resource_filename(
"bert_squeeze", "data/local_datasets/parallel_dataset.py"
),
"is_local": True,
"train_batch_size": 16,
"eval_batch_size": 4,
},
)
assert isinstance(distil_assistant.data.train_dataloader(), DataLoader)
where parallel_dataset.py
attempts to load a json file that is supposed to be in the same dir.
I can't find json files in the project
@tarekziade looking into the tests rn
I think there's a missing deps to pydantic
@JulesBelveze you need to install --with dev
and --with docs
but maybe we should make that dep explicit in dev
@tarekziade just fixed most of the tests, there's only one missing tests/assistants/test_distil_assistant.py::TestDistilAssistantParallel::test_data
.
But I am away from my laptop until mid-next week.
I suggest to just comment out this test, and I'll fix it once I have access to my laptop. So that you can keep moving forward 😸
Nice thanks for the fixes. Trying to fix the last one, I guess we can merge then this branch and I can start my longt5 branch from main
feels like the tests misses a local dataset dir with the json files (that use to be at /Users/jules/Desktop/Hypefactors/data-analysis/distil-industry/datasets/
) -- not sure what to do
Yeah exactly that seems to be the problem. I'll search for it on my old laptop and update it if I can't find it..
Feel free to merge this PR and I'll take care of this issue once I'm back @tarekziade 💪🏼
I don't have write access so I will let you merge when you're back
This branch is my attempt to make the project work with 3.11+ (currently fails for me)
There seem to be a mismatch between Dataset and DatasetDict in the code;
I don't know if it's because of the newest version, but for example, the TransformerDataModule loads a dataset and expect it to be a DatasetDict into its
setup
method, however,load_dataset
takes a split from the tests (train 10%) so it ends up being aDataset
and loses test and validation so that breaks setuplet me know if my changes make sense before I continue