center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.
https://ctid.mitre-engenuity.org/our-work/tram/
Apache License 2.0
422 stars 90 forks source link

Question: Is it required to change data inside config.json if I want to use fine tuned Sci-BERT for report analyses ? #212

Closed abhishekdhiman25 closed 4 months ago

abhishekdhiman25 commented 5 months ago

Hi Reader

I have installed TRAM using the developer's setup documnetation in my Windows 10 operating system. I have fine tuned Sci-BERT using the notebook "fine_tune_multi_label.ipynb " (path to notebook: tram >> user_notebooks >>fine_tune_multi_label.ipynb ). I used the my own JSON data for fine tuning which has same format as "multi_label.json". Now I want to know how can I use this fine tuned Sci-BERT for report analysis so that if I run the pipeline using command "tram pipeline run --model bert" the fine tuned Sci-BERT should run. I want to do this for some experiments and testing on my system.

Thanks for your help in advance.

mehaase commented 5 months ago

Hi @abhishekdhiman25. You can download your model files (pytorch_model.bin and config.json) into the tram/data/ml-models/bert_model/ directory. If you are training on a different set of techniques, you may also need to adjust classes.txt to match.

abhishekdhiman25 commented 5 months ago

Hi @abhishekdhiman25. You can download your model files (pytorch_model.bin and config.json) into the tram/data/ml-models/bert_model/ directory. If you are training on a different set of techniques, you may also need to adjust classes.txt to match.

Hi @mehaase Thanks for your response. I have analysed both config.json files the original used in directory "tram/data/ml-models/bert_model/" and the one in fine tuned, they data inside looks same in both. Question: Can you please confirm if it is required to change data manually in config.json in fine tuning case. ? I am using the 100 ATT&CK labels for fine tuning, do I need to change the JSON data in config.json for 100 label.

Thanks for your help again.

swfarnsworth commented 5 months ago

By way of correction, the classes.txt file needs to contain class labels in whatever order the model is expecting them. Changing the classes.txt file may break the model. Refer to my message in #214.

mehaase commented 4 months ago

Closing this due to inactivity. Please re-open if your question has not been answered.