No dataloader in data_preprocessing

ziqi-zhang commented 3 years ago

Hi, I notice that in the experiments/centralized/bilstm_exps/main_text_classification.py it imports different dataloaders for different tasks. For example:

import data_preprocessing.AGNews.data_loader
import data_preprocessing.SST_2.data_loader
import data_preprocessing.SemEval2010Task8.data_loader
import data_preprocessing.Sentiment140.data_loader
import data_preprocessing.news_20.data_loader

But I didn't find these dataloader in the data_preprocessing file. Are these code removed or lost?

Thanks,

RaymondTseng commented 3 years ago

Hi there,

FedNLP is focusing on developing transformer-based models now, we will add LSTM-based models back later. You can try the transformer-based models first like experiments/centralized/transformer_exps/main_tc.py

Thanks.

ziqi-zhang commented 3 years ago

Hi @RaymondTseng , thanks for your reply and I will try transformer_exps. But there are no scripts or README in the folder so I can directly use. Are the hyper-parameter setting the same as bilstm_exps? Or where can I refer the argument setting of main_tc.py?

Besides I wonder where can I find the old version of the dataloader code? I didn't see any other branch in this repo.

RaymondTseng commented 3 years ago

Hi @ziqi-zhang https://github.com/FedML-AI/FedNLP/blob/master/experiments/centralized/transformer_exps/main_tc.py take a look at the line 92 in this file

You can find the obsolete data loader here https://github.com/FedML-AI/FedNLP/tree/master/data/raw_data_loader/obsolete But I am not sure whether they work now...

ziqi-zhang commented 3 years ago

@RaymondTseng thanks! I will look at the code.

ziqi-zhang commented 3 years ago

@RaymondTseng I resumed the obsolete dataloadedr and they worked well, but I met another problem. I ran the centralized experiments of 20news on BiLSTM and the accuracy looks well. But when I transplant the standalone FedAvg from FedML to FedNLP, the accuracy drops a lot. I noticed that the training accuracy is good (around 90%) but the test accuracy is only 38%. All the parameter setting is the same as the centralized experiments. Do you have any hints about what the problem is? Thanks!

RaymondTseng commented 3 years ago

@ziqi-zhang Can you share your script or hyper-parameters for the fedavg experiment?

ziqi-zhang commented 3 years ago

@RaymondTseng The script for centralized experiment :

python experiments/centralized/bilstm_exps/main_text_classification.py \
  --model bilstm \
  --dataset 20news \
  --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
  --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
  --partition_method uniform \
  --hidden_size 300  \
  --num_layers 1 \
  --embedding_dropout 0 \
  --lstm_dropout 0.5 \
  --attention_dropout 0 \
  --batch_size 32 \
  --max_seq_len 512 \
  --embedding_length 300 \
  --lr 0.001 \
  --wd 0.0001 \
  --epochs 30 \
  --embedding_name glove \
  --embedding_file data/pretrained/glove.6B.300d.txt \
  --device cuda:0 \
  --do_remove_stop_words True \
  --do_remove_low_freq_words 5

The script for FedAvg experiment:

FEDNLP_DATA_HOME=./fednlp_data

CLIENT_NUM=10
WORKER_NUM=10
CI=0

CUDA_VISIBLE_DEVICES=$1 \
python main_fedavg.py \
    --gpu_num_per_server 1 \
    --gpu_server_num 1 \
    --dataset 20news \
    --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
    --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
    --client_num_in_total $CLIENT_NUM \
    --client_num_per_round $WORKER_NUM \
    --hidden_size 300  \
    --num_layers 1 \
    --embedding_dropout 0 \
    --lstm_dropout 0.5 \
    --attention_dropout 0 \
    --comm_round 300 \
    --epochs 1 \
    --batch_size 32 \
    --lr 0.001 \
    --client_optimizer adam \
    --wd 0.0001 \
    --max_seq_len 512 \
    --embedding_length 300 \
    --embedding_name glove \
    --embedding_file data/pretrained/glove.6B.300d.txt \
    --do_remove_stop_words True \
    --do_remove_low_freq_words 5 \
    --frequency_of_the_test 10 \
    --ci 0

RaymondTseng commented 3 years ago

@ziqi-zhang Maybe you can try a larger learning rate(like 0.005) and check the result

ziqi-zhang commented 3 years ago

@RaymondTseng I have solved the problem, but I have another question about niid label skew partition. What does alpha mean? Is 0.5 alpha means one client has 50% of data from one label and 50% from other labels?

ziqi-zhang commented 3 years ago

And I have another question about distributed BiLSTM FedAvg. In the README, FedAvg achieves nearly the same performance as centralized training. I thing FedAvg should have lower performance than centralized training. I guess the FedAvg experiment is not accurate?

RaymondTseng commented 3 years ago

@ziqi-zhang For the partition problem, you can take a look at our paper https://arxiv.org/abs/2104.08815 For fedavg, you have to set both client optimizer and server optimizer to SGD

ziqi-zhang commented 3 years ago

@RaymondTseng I tried SGD but get an even worse result. The test accuracy is only 5% but the accuracy of Adam is 65%. Here is my script. I guess SGD is not suitable in this case?

FEDNLP_DATA_HOME=./fednlp_data

CLIENT_NUM=100
WORKER_NUM=100
CI=0

CUDA_VISIBLE_DEVICES=$1 \
python server_protect/main_fedavg.py \
    --gpu_num_per_server 1 \
    --gpu_server_num 1 \
    --model bilstm \
    --dataset 20news \
    --data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
    --partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
    --partition_method uniform \
    --client_num_in_total $CLIENT_NUM \
    --client_num_per_round $WORKER_NUM \
    --hidden_size 300  \
    --num_layers 1 \
    --embedding_dropout 0 \
    --lstm_dropout 0.5 \
    --attention_dropout 0 \
    --comm_round 100 \
    --epochs 1 \
    --batch_size 32 \
    --lr 0.005 \
    --client_optimizer sgd \
    --wd 0.0001 \
    --max_seq_len 512 \
    --embedding_length 300 \
    --embedding_name glove \
    --embedding_file data/pretrained/glove.6B.300d.txt \
    --do_remove_stop_words True \
    --do_remove_low_freq_words 5 \
    --frequency_of_the_test 10 \
    --ci 0

FedML-AI / FedNLP

No dataloader in data_preprocessing #13