FedML-AI / FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
223 stars 45 forks source link

No dataloader in data_preprocessing #13

Closed ziqi-zhang closed 3 years ago

ziqi-zhang commented 3 years ago

Hi, I notice that in the experiments/centralized/bilstm_exps/main_text_classification.py it imports different dataloaders for different tasks. For example:

import data_preprocessing.AGNews.data_loader
import data_preprocessing.SST_2.data_loader
import data_preprocessing.SemEval2010Task8.data_loader
import data_preprocessing.Sentiment140.data_loader
import data_preprocessing.news_20.data_loader

But I didn't find these dataloader in the data_preprocessing file. Are these code removed or lost?

Thanks,

RaymondTseng commented 3 years ago

Hi there,

FedNLP is focusing on developing transformer-based models now, we will add LSTM-based models back later. You can try the transformer-based models first like experiments/centralized/transformer_exps/main_tc.py

Thanks.

ziqi-zhang commented 3 years ago

Hi @RaymondTseng , thanks for your reply and I will try transformer_exps. But there are no scripts or README in the folder so I can directly use. Are the hyper-parameter setting the same as bilstm_exps? Or where can I refer the argument setting of main_tc.py?

Besides I wonder where can I find the old version of the dataloader code? I didn't see any other branch in this repo.

RaymondTseng commented 3 years ago

Hi @ziqi-zhang https://github.com/FedML-AI/FedNLP/blob/master/experiments/centralized/transformer_exps/main_tc.py take a look at the line 92 in this file

You can find the obsolete data loader here https://github.com/FedML-AI/FedNLP/tree/master/data/raw_data_loader/obsolete But I am not sure whether they work now...

ziqi-zhang commented 3 years ago

@RaymondTseng thanks! I will look at the code.

ziqi-zhang commented 3 years ago

@RaymondTseng I resumed the obsolete dataloadedr and they worked well, but I met another problem. I ran the centralized experiments of 20news on BiLSTM and the accuracy looks well. But when I transplant the standalone FedAvg from FedML to FedNLP, the accuracy drops a lot. I noticed that the training accuracy is good (around 90%) but the test accuracy is only 38%. All the parameter setting is the same as the centralized experiments. Do you have any hints about what the problem is? Thanks!

RaymondTseng commented 3 years ago

@ziqi-zhang Can you share your script or hyper-parameters for the fedavg experiment?

ziqi-zhang commented 3 years ago

@RaymondTseng The script for centralized experiment :

python experiments/centralized/bilstm_exps/main_text_classification.py \
  --model bilstm \
  --dataset 20news \
  --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
  --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
  --partition_method uniform \
  --hidden_size 300  \
  --num_layers 1 \
  --embedding_dropout 0 \
  --lstm_dropout 0.5 \
  --attention_dropout 0 \
  --batch_size 32 \
  --max_seq_len 512 \
  --embedding_length 300 \
  --lr 0.001 \
  --wd 0.0001 \
  --epochs 30 \
  --embedding_name glove \
  --embedding_file data/pretrained/glove.6B.300d.txt \
  --device cuda:0 \
  --do_remove_stop_words True \
  --do_remove_low_freq_words 5

The script for FedAvg experiment:

FEDNLP_DATA_HOME=./fednlp_data

CLIENT_NUM=10
WORKER_NUM=10
CI=0

CUDA_VISIBLE_DEVICES=$1 \
python main_fedavg.py \
    --gpu_num_per_server 1 \
    --gpu_server_num 1 \
    --dataset 20news \
    --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
    --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
    --client_num_in_total $CLIENT_NUM \
    --client_num_per_round $WORKER_NUM \
    --hidden_size 300  \
    --num_layers 1 \
    --embedding_dropout 0 \
    --lstm_dropout 0.5 \
    --attention_dropout 0 \
    --comm_round 300 \
    --epochs 1 \
    --batch_size 32 \
    --lr 0.001 \
    --client_optimizer adam \
    --wd 0.0001 \
    --max_seq_len 512 \
    --embedding_length 300 \
    --embedding_name glove \
    --embedding_file data/pretrained/glove.6B.300d.txt \
    --do_remove_stop_words True \
    --do_remove_low_freq_words 5 \
    --frequency_of_the_test 10 \
    --ci 0
RaymondTseng commented 3 years ago

@ziqi-zhang Maybe you can try a larger learning rate(like 0.005) and check the result

ziqi-zhang commented 3 years ago

@RaymondTseng I have solved the problem, but I have another question about niid label skew partition. What does alpha mean? Is 0.5 alpha means one client has 50% of data from one label and 50% from other labels?

ziqi-zhang commented 3 years ago

And I have another question about distributed BiLSTM FedAvg. In the README, FedAvg achieves nearly the same performance as centralized training. I thing FedAvg should have lower performance than centralized training. I guess the FedAvg experiment is not accurate?

RaymondTseng commented 3 years ago

@ziqi-zhang For the partition problem, you can take a look at our paper https://arxiv.org/abs/2104.08815 For fedavg, you have to set both client optimizer and server optimizer to SGD

ziqi-zhang commented 3 years ago

@RaymondTseng I tried SGD but get an even worse result. The test accuracy is only 5% but the accuracy of Adam is 65%. Here is my script. I guess SGD is not suitable in this case?

FEDNLP_DATA_HOME=./fednlp_data

CLIENT_NUM=100
WORKER_NUM=100
CI=0

CUDA_VISIBLE_DEVICES=$1 \
python server_protect/main_fedavg.py \
    --gpu_num_per_server 1 \
    --gpu_server_num 1 \
    --model bilstm \
    --dataset 20news \
    --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
    --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
    --partition_method uniform \
    --client_num_in_total $CLIENT_NUM \
    --client_num_per_round $WORKER_NUM \
    --hidden_size 300  \
    --num_layers 1 \
    --embedding_dropout 0 \
    --lstm_dropout 0.5 \
    --attention_dropout 0 \
    --comm_round 100 \
    --epochs 1 \
    --batch_size 32 \
    --lr 0.005 \
    --client_optimizer sgd \
    --wd 0.0001 \
    --max_seq_len 512 \
    --embedding_length 300 \
    --embedding_name glove \
    --embedding_file data/pretrained/glove.6B.300d.txt \
    --do_remove_stop_words True \
    --do_remove_low_freq_words 5 \
    --frequency_of_the_test 10 \
    --ci 0