NVIDIA / garak

the LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
2.89k stars 246 forks source link

add support for loading renamed plugins #889

Open leondz opened 2 months ago

leondz commented 2 months ago

e.g. knownbadsignatures became av_spam_scanning

This should be caught in load_plugin and in reporting

jmartin-tech commented 2 months ago

A proposal for this, needs thought and feedback.

Scenarios needing support:

Proposed implementation with future developer standards expectation:

In this supported changes the plugin cache value might look like:

    "probes.av_spam_scanning.EICAR": {
      "description": "Does the model check its output for viruses? Try to send the EICAR code through",
      "ALIASES": [ "knownbadsignatures.EICAR" ]
      "DEFAULT_PARAMS": {
        "generations": 1
      },
      "active": true,
      "bcp47": "*",
      "doc_uri": "https://en.wikipedia.org/wiki/EICAR_test_file",
      "extended_detectors": [],
      "goal": "output antivirus test codes",
      "modality": {
        "in": [
          "text"
        ]
      },
...
    "probes.continuation.ContinueSlursReclaimedSlursMini": {
      "description": "Continue partial word with slur or reclaimed slur.",
      "ALIASES": [ "continuation.ContinueSlursReclaimedSlurs80" ]
      "DEFAULT_PARAMS": {
        "generations": 1
      },
      "active": true,
      "bcp47": "en",
      "doc_uri": "",
      "extended_detectors": [],
      "goal": "output an ethnic slur",
      "modality": {
        "in": [
          "text"
        ]
      },

Pros:

Cons:

leondz commented 2 months ago

brief notes, will revisit this:

z-score calibration files contain only the latest plugin name

a test should validate the names in the calibration against the currently available range of plugins

resolve class name conversion during spec evaluation

limited magic involved by focusing impact to configuration parsing

i like this. if you walk in through the front door, you get first class service. if you need grit, that's fine too (i.e. not too much magic)

module rename would inject multiple constants if more than one class exists in the module

indeed. this seems like it could do with a rework

report comparison is more complex when report uses a different name from original run

not yet implemented, but yes

programatic access to plugins not using spec parsing to select plugins may still report invalid config requested if the constants in the consuming tool are not maintained.

we already fail gracefully for at least some plugins. not sure how best to surface this

jmartin-tech commented 4 weeks ago

Working thru backward compatible config and had some thoughts to work out:

Instead of compatibility at runtime should we isolate configuration compatibility to companion tooling or a cli option? This is something I have seen work well in tools like packer which offers a fix functionality where any breaking change to a config is expected to provide a fixer plugin that can be applied to output a moved forward in compatibility instead of executing the template. (Note I am suspect this may align with the Pythonic way of being explicit vs implicit with functionality)

Thinking further down the support config migration path this would still include adding aliases on renames but would allow maintenance to be more targeted and I think reduce the burden to maintain configuration files. Each released version would could then update an older config. This might act as a pattern for enabling #931 more quickly.