Closed ziqi-zhang closed 3 years ago
Hi there,
FedNLP is focusing on developing transformer-based models now, we will add LSTM-based models back later. You can try the transformer-based models first like experiments/centralized/transformer_exps/main_tc.py
Thanks.
Hi @RaymondTseng , thanks for your reply and I will try transformer_exps. But there are no scripts or README in the folder so I can directly use. Are the hyper-parameter setting the same as bilstm_exps? Or where can I refer the argument setting of main_tc.py?
Besides I wonder where can I find the old version of the dataloader code? I didn't see any other branch in this repo.
Hi @ziqi-zhang https://github.com/FedML-AI/FedNLP/blob/master/experiments/centralized/transformer_exps/main_tc.py take a look at the line 92 in this file
You can find the obsolete data loader here https://github.com/FedML-AI/FedNLP/tree/master/data/raw_data_loader/obsolete But I am not sure whether they work now...
@RaymondTseng thanks! I will look at the code.
@RaymondTseng I resumed the obsolete dataloadedr and they worked well, but I met another problem. I ran the centralized experiments of 20news on BiLSTM and the accuracy looks well. But when I transplant the standalone FedAvg from FedML to FedNLP, the accuracy drops a lot. I noticed that the training accuracy is good (around 90%) but the test accuracy is only 38%. All the parameter setting is the same as the centralized experiments. Do you have any hints about what the problem is? Thanks!
@ziqi-zhang Can you share your script or hyper-parameters for the fedavg experiment?
@RaymondTseng The script for centralized experiment :
python experiments/centralized/bilstm_exps/main_text_classification.py \
--model bilstm \
--dataset 20news \
--data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
--partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
--partition_method uniform \
--hidden_size 300 \
--num_layers 1 \
--embedding_dropout 0 \
--lstm_dropout 0.5 \
--attention_dropout 0 \
--batch_size 32 \
--max_seq_len 512 \
--embedding_length 300 \
--lr 0.001 \
--wd 0.0001 \
--epochs 30 \
--embedding_name glove \
--embedding_file data/pretrained/glove.6B.300d.txt \
--device cuda:0 \
--do_remove_stop_words True \
--do_remove_low_freq_words 5
The script for FedAvg experiment:
FEDNLP_DATA_HOME=./fednlp_data
CLIENT_NUM=10
WORKER_NUM=10
CI=0
CUDA_VISIBLE_DEVICES=$1 \
python main_fedavg.py \
--gpu_num_per_server 1 \
--gpu_server_num 1 \
--dataset 20news \
--data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
--partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
--client_num_in_total $CLIENT_NUM \
--client_num_per_round $WORKER_NUM \
--hidden_size 300 \
--num_layers 1 \
--embedding_dropout 0 \
--lstm_dropout 0.5 \
--attention_dropout 0 \
--comm_round 300 \
--epochs 1 \
--batch_size 32 \
--lr 0.001 \
--client_optimizer adam \
--wd 0.0001 \
--max_seq_len 512 \
--embedding_length 300 \
--embedding_name glove \
--embedding_file data/pretrained/glove.6B.300d.txt \
--do_remove_stop_words True \
--do_remove_low_freq_words 5 \
--frequency_of_the_test 10 \
--ci 0
@ziqi-zhang Maybe you can try a larger learning rate(like 0.005) and check the result
@RaymondTseng I have solved the problem, but I have another question about niid label skew partition. What does alpha mean? Is 0.5 alpha means one client has 50% of data from one label and 50% from other labels?
And I have another question about distributed BiLSTM FedAvg. In the README, FedAvg achieves nearly the same performance as centralized training. I thing FedAvg should have lower performance than centralized training. I guess the FedAvg experiment is not accurate?
@ziqi-zhang For the partition problem, you can take a look at our paper https://arxiv.org/abs/2104.08815 For fedavg, you have to set both client optimizer and server optimizer to SGD
@RaymondTseng I tried SGD but get an even worse result. The test accuracy is only 5% but the accuracy of Adam is 65%. Here is my script. I guess SGD is not suitable in this case?
FEDNLP_DATA_HOME=./fednlp_data
CLIENT_NUM=100
WORKER_NUM=100
CI=0
CUDA_VISIBLE_DEVICES=$1 \
python server_protect/main_fedavg.py \
--gpu_num_per_server 1 \
--gpu_server_num 1 \
--model bilstm \
--dataset 20news \
--data_file $FEDNLP_DATA_HOME/data_files/20news_data.h5 \
--partition_file $FEDNLP_DATA_HOME/partition_files/20news_partition.h5 \
--partition_method uniform \
--client_num_in_total $CLIENT_NUM \
--client_num_per_round $WORKER_NUM \
--hidden_size 300 \
--num_layers 1 \
--embedding_dropout 0 \
--lstm_dropout 0.5 \
--attention_dropout 0 \
--comm_round 100 \
--epochs 1 \
--batch_size 32 \
--lr 0.005 \
--client_optimizer sgd \
--wd 0.0001 \
--max_seq_len 512 \
--embedding_length 300 \
--embedding_name glove \
--embedding_file data/pretrained/glove.6B.300d.txt \
--do_remove_stop_words True \
--do_remove_low_freq_words 5 \
--frequency_of_the_test 10 \
--ci 0
Hi, I notice that in the experiments/centralized/bilstm_exps/main_text_classification.py it imports different dataloaders for different tasks. For example:
But I didn't find these dataloader in the data_preprocessing file. Are these code removed or lost?
Thanks,