center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.
https://ctid.mitre-engenuity.org/our-work/tram/
Apache License 2.0
436 stars 93 forks source link

Pretrained BERT model? #197

Closed priamai closed 1 year ago

priamai commented 1 year ago

Hi there, I have just deployed the last version via docker and noticed that there are only 2 models pre trained.

image

It would useful to know how to: a) train the SCIBERT on some annotated dataset (the link is broken I guess is a private repo)? b) download a pre-trained SCIBERT

Cheers! @mehaase

mehaase commented 1 year ago

Hi @priamai, that screen is a bit misleading. It is showing stats for the models that were trained inside the container; the SciBERT model is trained outside the container (by us, on high-end GPUs) and downloaded into the docker container. If you want to fine-tune the model on your own data, we have some jupyter notebooks to facilitate that: https://github.com/center-for-threat-informed-defense/tram/wiki/Large-Language-Models#jupyter-notebooks

(I also fixed the broken link that you were looking at: https://github.com/center-for-threat-informed-defense/tram/wiki/Data-Annotation)

priamai commented 1 year ago

Hi @mehaase but when I upload a report it doesn't let me choose the mode, so does it default to the SCIBERT? Thanks for fixing the link! I love the colabo books so we can fine tune for free on Colab!

mehaase commented 1 year ago

Yes it defaults to scibert. The choice of model is specified entrypoint.sh.