NCATComp410 / comp410_summer_2023

Repository for COMP-410 summer 2023
GNU General Public License v3.0
0 stars 3 forks source link

Added detection for credit card number #31

Closed brmtalla closed 1 year ago

brmtalla commented 1 year ago

I think you should be using the CREDIT_CARD entity?

I tried that at first, but would get the error "'CREDIT_CARD' not found in '[type: US_BANK_NUMBER, start: 0, end: 16, score: 0.05, type: US_DRIVER_LICENSE, start: 0, end: 16, score: 0.01]", so I don't think that's an entity I can use.

claesmk commented 1 year ago

The issue here is that your cc numbers in your test cases do not look like potentially valid credit card numbers. Even the example you included in your issue does not get detected because it does not pass the validation steps that Microsoft has chosen to implement.

Usually the easiest way to determine what's going wrong is to look at the underlying source code that is responsible for the detection.

There are multiple sources which document some example credit card numbers. Square's developer documentation is a good choice.

From that you could construct a test case that looks something like this:

    def test_cc_number_detection(self):
        # test a (possibly) valid credit card number
        # https://developer.squareup.com/docs/devtools/sandbox/payments
        cc_nums = {'Visa': '4111 1111 1111 1111',
                   'Mastercard': '5105 1051 0510 5100',
                   'Discover': '6011 0000 0000 0004',
                   'Diners Club': '3000 000000 0004',
                   'JCB': '3566 1111 1111 1113',
                   'American Express': '3400 000000 00009'}

        for cc_type, cc_num in cc_nums.items():
            print('testing {} number: {}'.format(cc_type, cc_num))
            results = analyze_text(cc_num)
            print(results)
            self.assertIn('CREDIT_CARD', str(results))

        # test an invalid credit card number
        results = analyze_text('1234 5678 9012 3456')
        self.assertNotIn('CREDIT_CARD', str(results))

Using predefined sample credit card numbers is superior to trying to make them up on your own. If you make up a number it's possible you could make a lucky guess and actually use someone's real credit card number. Even though the number without an expiration date or validation pin is generally useless, you wouldn't want a real number in your source code.

claesmk commented 1 year ago

@brmtalla this looks ready for you to merge and delete your branch & Codespaces