FedML-AI / FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
223 stars 45 forks source link

object doesn't exist for text classification script #15

Closed ayanflow closed 2 years ago

ayanflow commented 3 years ago

If I run a text classification model with distilbert using:

DATA_NAME=20news CUDA_VISIBLE_DEVICES=1 python -m experiments.centralized.transformer_exps.main_tc \ --dataset ${DATA_NAME} \ --data_file ~/fednlp_data/data_files/${DATA_NAME}_data.h5 \ --partition_file ~/fednlp_data/partition_files/${DATA_NAME}_partition.h5 \ --partition_method niid_label_clients=100.0_alpha=5.0 \ --model_type distilbert \ --model_name distilbert-base-uncased \ --do_lower_case True \ --train_batch_size 32 \ --eval_batch_size 8 \ --max_seq_length 256 \ --learning_rate 5e-5 \ --epochs 20 \ --evaluate_during_training_steps 500 \ --output_dir /tmp/${DATA_NAME}_fed/ \ --n_gpu 1

I got as errror 'KeyError: "Unable to open object (object 'niid_label_clients=100.0_alpha=5.0' doesn't exist)"', but the object should exist?

chaoyanghe commented 3 years ago

@DeviRule please check this issue?

LorrinWWW commented 3 years ago

Same here

LorrinWWW commented 3 years ago

After looking into the data and code, it seems that the default value of "--partition_method" in the example is incorrect. It should be "niid_label_clients=100_alpha=5.0" rather than "niid_label_clients=100.0_alpha=5.0".

Another issue is that the following lines of "data_manager/text_classification_data_manager.py" will raise errors. https://github.com/FedML-AI/FedNLP/blob/55004052297c99f5328b1e51834ec4f0a5bb1373/data_manager/text_classification_data_manager.py#L23-L24 Simply removing ".decode('utf-8')" can address this issue.

DeviRule commented 3 years ago

@LorrinWWW Please check your h5py version as we are using 3.1.0. In order to use the h5py 3.1.0, you have to add the .decode("utf-8") otherwise it will raise a type error. If you encounter any other issue you can copy and paste the error and post it in a new issue.

ysgncss commented 2 years ago

There are still the same mistakes:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Loading index from h5 file.: 100%|██████████| 100/100 [00:00<00:00, 1058.75it/s] Loading data from h5 file.: 0%| | 0/11314 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/miniconda3/envs/fednlp/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/root/miniconda3/envs/fednlp/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/FedNLP-master/experiments/centralized/transformer_exps/main_tc.py", line 91, in train_dl, test_dl = dm.load_centralized_data() File "/home/FedNLP-master/data_manager/base_data_manager.py", line 112, in load_centralized_data train_data = self.read_instance_from_h5(data_file, train_index_list) File "/home/FedNLP-master/data_manager/text_classification_data_manager.py", line 25, in read_instance_from_h5 X.append(data_file["X"][str(idx)][()]) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/root/miniconda3/envs/fednlp/lib/python3.7/site-packages/h5py/_hl/group.py", line 288, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: 'Unable to open object (bad heap free list)'