JulesBelveze / bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
https://julesbelveze.github.io/bert-squeeze/
78 stars 10 forks source link

Update to Python 3.11 #55

Closed tarekziade closed 5 months ago

tarekziade commented 5 months ago

This branch is my attempt to make the project work with 3.11+ (currently fails for me)

There seem to be a mismatch between Dataset and DatasetDict in the code;

I don't know if it's because of the newest version, but for example, the TransformerDataModule loads a dataset and expect it to be a DatasetDict into its setup method, however, load_dataset takes a split from the tests (train 10%) so it ends up being a Dataset and loses test and validation so that breaks setup

let me know if my changes make sense before I continue

tarekziade commented 5 months ago

I think it's fixed now.

There are a few remaining failures when I run poetry run task test

FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_two_hf_models - assert False
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_student_torch_model - hydra.errors.InstantiationException: Error in call to target 'transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained':
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistant::test_teacher_torch_model - hydra.errors.InstantiationException: Error in call to target 'transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained':
FAILED tests/assistants/test_distil_assistant.py::TestDistilAssistantParallel::test_data - FileNotFoundError: Local file /Users/tarekziade/Dev/bert-squeeze/bert_squeeze/data/local_datasets/parallel/train.json doesn't exist
FAILED tests/assistants/test_train_assistant.py::TestTrainAssistant::test_fastbert_assistant - hydra.errors.InstantiationException: Error in call to target 'bert_squeeze.models.lt_fastbert.LtFastBert':

For the last one see my remark here https://github.com/JulesBelveze/bert-squeeze/pull/55/commits/0ca60a6b5dfe495a3591aee9dc33ac2509cff7ca

tarekziade commented 5 months ago

For *** FileNotFoundError: Local file /Users/tarekziade/Dev/bert-squeeze/bert_squeeze/data/local_datasets/parallel/train.json doesn't exist

it's happening with:

        distil_assistant = DistilAssistant(
            "distil-parallel",
            teacher_kwargs={
                "_target_": "tests.fixtures.dummy_models.Lr",
                "checkpoints": "../tests/fixtures/resources/lr_dummy.bin",
            },
            student_kwargs={
                "_target_": "tests.fixtures.dummy_models.Lr",
            },
            data_kwargs={
                "path": resource_filename(
                    "bert_squeeze", "data/local_datasets/parallel_dataset.py"
                ),
                "is_local": True,
                "train_batch_size": 16,
                "eval_batch_size": 4,
            },
        )
        assert isinstance(distil_assistant.data.train_dataloader(), DataLoader)

where parallel_dataset.py attempts to load a json file that is supposed to be in the same dir.

I can't find json files in the project

JulesBelveze commented 5 months ago

@tarekziade looking into the tests rn

I think there's a missing deps to pydantic

tarekziade commented 5 months ago

@JulesBelveze you need to install --with dev and --with docs but maybe we should make that dep explicit in dev

JulesBelveze commented 5 months ago

@tarekziade just fixed most of the tests, there's only one missing tests/assistants/test_distil_assistant.py::TestDistilAssistantParallel::test_data. But I am away from my laptop until mid-next week. I suggest to just comment out this test, and I'll fix it once I have access to my laptop. So that you can keep moving forward 😸

tarekziade commented 5 months ago

Nice thanks for the fixes. Trying to fix the last one, I guess we can merge then this branch and I can start my longt5 branch from main

tarekziade commented 5 months ago

feels like the tests misses a local dataset dir with the json files (that use to be at /Users/jules/Desktop/Hypefactors/data-analysis/distil-industry/datasets/) -- not sure what to do

JulesBelveze commented 5 months ago

Yeah exactly that seems to be the problem. I'll search for it on my old laptop and update it if I can't find it..

Feel free to merge this PR and I'll take care of this issue once I'm back @tarekziade 💪🏼

tarekziade commented 5 months ago

I don't have write access so I will let you merge when you're back