Loss turning into 0 during training

Er-gou123 commented 3 weeks ago

I have placed your instruction data into the dataset for the first stage of fine-tuning, but at 27% progress, the loss seems to have become zero, and the loss has been close to zero throughout the process. I would like to consult with you whether this is normal.

dataset: {"instruction": "I have a copy of traffic that may be botnet traffic, but I don't know what type of network it belongs to. Can you give me some information?", "output": "Botnet Detection"} {"instruction": "You are a traffic analysis expert, please perform the encrypted traffic classification task to determine the application source of the following traffic.", "output": "Encrypted App Classification"} {"instruction": "What type of application might this network traffic belong to? Please distinguish between normal application and harmful application.", "output": "Malware Traffic Detection"} {"instruction": "If this is tunnel encrypted network traffic, make an effort to identify and explain the type of user behavior behind it.", "output": "Encrypted VPN Detection"} {"instruction": "Encrypted applications usually have different traffic patterns in their communication behavior. Please create a piece of traffic data for the Acm application based on your knowledge of application traffic characteristics.", "output": "Encrypted App Generation"} {"instruction": "I would like you to help analyze the following unknown traffic, determine whether there is concept drift phenomenon, and identify the label type of the traffic. Please note that the unknown patterns that may result from version updates require special attention.", "output": "Concept Drift"} {"instruction": "Highly sophisticated encrypted traffic can be used for long-term penetration attacks, often by advanced attackers. Check whether the following traffic is APT attack traffic.", "output": "APT Attack Detection"} {"instruction": "Handle the fingerprint features extracted from network traffic sessions, classify website fingerprints, and identify the website categories corresponding to these features.", "output": "Website Fingerprinting"} {"instruction": "Check the detailed DoH traffic to determine whether it contains malicious DoH behavior. The data are as follows:", "output": "Malicious DoH Detection"} {"instruction": "Based on the encrypted application traffic protocol details, traffic characteristics, and payload you have mastered, create a traffic packet of the Arxiv type.", "output": "Encrypted App Generation"} {"instruction": "Can hidden application types be found in VPN encrypted network traffic data?", "output": "Encrypted VPN Detection"} {"instruction": "Analyze the following network traffic to identify possible types of malware that violate the rights of computer users.", "output": "Malware Traffic Detection"} {"instruction": "Please, as a network security consultant, evaluate the following traffic data to determine whether it is consistent with known APT traffic patterns and output malicious or benign labels.", "output": "APT Attack Detection"} {"instruction": "Hi, I have noticed at home that my children spend a lot of time online and worry that they may be exposed to unsafe apps. I have obtained some network traffic data, please help to analyze it to determine which applications the children are accessing.", "output": "Encrypted App Classification"} {"instruction": "I suspect that the following may be a malicious doh traffic data tampered by the attacker. Please identify whether the traffic behind this section belongs to normal or malicious doh traffic.", "output": "Malicious DoH Detection"} {"instruction": "Analyze the pattern characteristics of the Web request data and determine whether it contains malicious attacks.", "output": "Web Attack Detection"}....

The change of loss during training： 4a288ed691ec140d4d4139fdb5d039a

CuiTianyu961030 commented 3 weeks ago

Hi Er-gou123，

Thanks for using our codes!

As described in the picture, the learning rate of the training process is still decreasing normally. This is because the ChatGLM's loss is displayed as 0.0 when it is less than 0.0001.

In our experiments, when using the instruction dataset as the training set, we observed that the loss was close to 0 at around epoch 5.5. Therefore, reducing max_steps in your settings will help you complete the training more efficiently.

Er-gou123 commented 3 weeks ago

Will the same situation occur in training traffic datasets?

CuiTianyu961030 commented 3 weeks ago

It depends on the difficulty of the traffic task and the training settings. In our experiments on the USTC TFC 2016 dataset, the loss after 10000 steps (bartch size = 1, epoch = 3.31) of the traffic tuning dataset we provided is about 0.012012.

I hope the answer can help you.

Er-gou123 commented 3 weeks ago

I understand，thank you for your explanation

ZGC-LLM-Safety / TrafficLLM

Loss turning into 0 during training #18