center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.
https://ctid.mitre-engenuity.org/our-work/tram/
Apache License 2.0
422 stars 90 forks source link

Need help: BERT Fine Tuning #215

Closed abhishekdhiman25 closed 4 months ago

abhishekdhiman25 commented 4 months ago

Hi Reader

I have installed TRAM using developer's setup guide in windows system. I want to fine tune SciBERT using "fine_tune_multi_label.ipynb" notebook with my own data. I want to know do i need to change the classes in according to my data. Actually i have prepared training data for all 537 ATT&CK labels in similar format as of "multi_label.json". Is it is necessary to change the Classes in 2nd cell code of "fine_tune_multi_label.ipynb", If Yes how it can be done , is there a particular format for this. My colleague tried this earlier and got some error related to out_features of bert model set to 50 , so he set it to 537 but accuracy score dropped to zero.

Please tell me some way how can i fine tune it on my data with larger number of ATT&CK labels more than 50.

For reference 2nd cell Code: from sklearn.preprocessing import MultiLabelBinarizer as MLB

CLASSES = [ 'T1003.001', 'T1005', 'T1012', 'T1016', 'T1021.001', 'T1027', 'T1033', 'T1036.005', 'T1041', 'T1047', 'T1053.005', 'T1055', 'T1056.001', 'T1057', 'T1059.003', 'T1068', 'T1070.004', 'T1071.001', 'T1072', 'T1074.001', 'T1078', 'T1082', 'T1083', 'T1090', 'T1095', 'T1105', 'T1106', 'T1110', 'T1112', 'T1113', 'T1140', 'T1190', 'T1204.002', 'T1210', 'T1218.011', 'T1219', 'T1484.001', 'T1518.001', 'T1543.003', 'T1547.001', 'T1548.002', 'T1552.001', 'T1557.001', 'T1562.001', 'T1564.001', 'T1566.001', 'T1569.002', 'T1570', 'T1573.001', 'T1574.002' ]

mlb = MLB(classes=CLASSES) mlb.fit([[c] for c in CLASSES])

mlb

mehaase commented 4 months ago

I think this is answered in #216. Please re-open if there's something I missed.