bangawayoo / nlp-watermarking

Robust natural language watermarking using invariant features
25 stars 3 forks source link

Example broken by custom keywords #2

Closed FabienRoger closed 1 year ago

FabienRoger commented 1 year ago

When running scripts/example/run-imdb.sh I get the following stack trace:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ubuntu/nlp-watermarking/./ours.py:37 in <module>                                           │
│                                                                                                  │
│    34 spacy_tokenizer = spacy.load(generic_args.spacy_model)                                     │
│    35 if "trf" in generic_args.spacy_model:                                                      │
│    36 │   spacy.require_gpu()                                                                    │
│ ❱  37 model = InfillModel(infill_args, dirname=dirname)                                          │
│    38                                                                                            │
│    39 _, cover_texts = model.return_dataset()                                                    │
│    40                                                                                            │
│                                                                                                  │
│ /home/ubuntu/nlp-watermarking/models/watermark.py:70 in __init__                                 │
│                                                                                                  │
│    67 │   │   │   │   │   │   │   │   │   │     mask_order_by=args.mask_order_by,                │
│    68 │   │   │   │   │   │   │   │   │   │     keyword_mask=args.keyword_mask,                  │
│    69 │   │   │   │   │   │   │   │   │   │     exclude_cc=args.exclude_cc,                      │
│ ❱  70 │   │   │   │   │   │   │   │   │   │     custom_keywords=args.custom_keywords             │
│    71 │   │   │   │   │   │   │   │   │   │     )                                                │
│    72 │   │   self.nlp = spacy.load(args.spacy_model)                                            │
│    73                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Namespace' object has no attribute 'custom_keywords'
bangawayoo commented 1 year ago

Hi, The custom keyword functionality was only used for demo.py in case anyone wanted to exclude any keywords or proper nouns that are not detected by the spacy model.

If you'd like to use for imdb as well, just add this line in ours.py. The InfillModel will handle it afterwards and exclude it from being masked out.

Hope this helps!

bangawayoo commented 1 year ago

Hi @FabienRoger I am closing the issue due to inactivity. Feel free to open it if there is another issue.

awaylong commented 8 months ago

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight']