KeyError: 'num_attn_heads_encoder_pep'

SteamedGit commented 4 months ago

I'm trying to run full_learning_new.py, with the command:

python full_learning_new.py --train_dir data/train.csv --test_dir data/tune.csv --modelconfig configs/shallow.config.json --save models/ --batch_size 128

I've tried shallow.config.json since that's what is used in run_full_learning.sh

barthelemymp commented 4 months ago

Hi,

What is your question ? I would recommend using the --skipMiss option

SteamedGit commented 4 months ago

My question is why do the args in run_full_learning.sh not work?

barthelemymp commented 4 months ago

Ah sorry error was in the title. I look into that...

barthelemymp commented 4 months ago

Could you use shallow0_decoupled.config.json. In the last training script I introduced the possibility to have different config for the peptide encoder and CDRs encoder. shallow0_decoupled.config.json is the same as configs/shallow.config.json.

Thanks for the remark I pushed the correction

SteamedGit commented 4 months ago

Ok thanks. I've been able to train with that config. Unfortunately, predict.py does not work with shallow0_decoupled.config.json.

python predict.py --test_dir data/tune.csv --modelconfig configs/shallow0_decoupled.config.json --load models/300/model.safetensors --output data_output/

Gives the error:

File "TULIP-TCR/predict.py", line 117, in main num_attention_heads = modelconfig["num_attn_heads"], KeyError: 'num_attn_heads'

SteamedGit commented 4 months ago

Switching to configs/shallow.config.json. I get a different error:

loading hyperparameter Using device: cuda Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 55 Loading models .. self.pad_token_id 1 self.pad_token_id 1 self.pad_token_id 1 Traceback (most recent call last): File "TULIP-TCR/predict.py", line 183, in main() File "TULIP-TCR/predict.py", line 155, in main checkpoint = torch.load(args.load) File "/opt/conda/envs/tulipenv/lib/python3.10/site-packages/torch/serialization.py", line 1040, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/opt/conda/envs/tulipenv/lib/python3.10/site-packages/torch/serialization.py", line 1262, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow

Maybe this is related to full_learning_new.py saving the model using safetensors but the example model weights being in a .bin format?

barthelemymp commented 4 months ago

can you compare the folder saved with your model and mine ? Btw how come you have already trained your model (seems super fast).

barthelemymp commented 4 months ago

should contain:

config.json generation_config.json pytorch_model.bin

SteamedGit commented 4 months ago

I managed to train a model yesterday with a newer version of transformers that seems to save the model as model.safetensors. To load models in this newer format I think you can use safetensors.torch. load_model(model,args.load)

barthelemymp / TULIP-TCR

KeyError: 'num_attn_heads_encoder_pep' #12