center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.
https://ctid.mitre-engenuity.org/our-work/tram/
Apache License 2.0
422 stars 90 forks source link

Need Help: Error Encountered while running tram pipeline with bert (in developer's setup) #209

Closed abhishekdhiman25 closed 5 months ago

abhishekdhiman25 commented 5 months ago

Hi Reader, I have successfully completed all the steps from developer's setup guide. Tram is running on my local environment (windows 10). All the ML models are running successfully but when i am trying to run the command "tram pipeline run --model bert", it is throwing error. raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/tram/data/priv-allenai-scibert-scivocab-uncased'. Use repo_type argument if needed.

This error is related to Line number 372 of base.py file (path to base.py : tram>>src>>tram>>ml>>base.py). Line 372 code: tokenizer = AutoTokenizer.from_pretrained("/tram/data/priv-allenai-scibert-scivocab-uncased")

I am not able to see any "priv-allenai-scibert-scivocab-uncased" under data directory. Please tell me the solution for this error.

Thanks in advance for you help.

mehaase commented 5 months ago

HI @abhishekdhiman25, thank you for trying out TRAM! I apologize for the issue you ran into. I just pushed an update on main that should help. Please pull the latest commit, and run this command to download the tokenizer model into your directory tree:

python3 -c "import os; import transformers; mdl = transformers.AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased'); mdl.save_pretrained('data/ml-models/priv-allenai-scibert-scivocab-uncased')"

I updated the developer wiki to add this step, as well.

abhishekdhiman25 commented 5 months ago

Thanks for the help issue is resolved