GrandComicsDatabase / gcd-django

Website of Grand Comics Database
GNU General Public License v3.0
90 stars 31 forks source link

Semi-colons inside parentheses confuse credit migration #562

Open LegalizeAdulthood opened 1 year ago

LegalizeAdulthood commented 1 year ago

Suppose we have a credit like this:

Bob Layton (editor; editor-in-chief)

The migration logic does a blind split of the string on ; and thinks this is two credits, not one. If we replace the blind split with a tokenizer then the semi-colon can be preserved as a detail note on the individual credit.

This doesn't happen too often in the thousands of credits I've migrated, but it does occur often enough to think about enhancing the algorithm.

gcd-github commented 1 year ago

    Though the possibility exists that a single person could have held both positions in an organization simultaneously.  Perhaps during a transition in one position or perhaps for legal or bureaucratic reasons.  So it might actually be two credits as found in an issue; be careful with assumptions.

On 10/17/2022 12:19 PM, 'Richard Thomson' via gcd-tech wrote:

Suppose we have a credit like this:

|Bob Layton (editor; editor-in-chief) |

The migration logic does a blind split of the string on |;| and thinks this is two credits, not one. If we replace the blind split with a tokenizer then the semi-colon can be preserved as a detail note on the individual credit.

This doesn't happen too often in the thousands of credits I've migrated, but it does occur often enough to think about enhancing the algorithm.

— Reply to this email directly, view it on GitHub https://github.com/GrandComicsDatabase/gcd-django/issues/562, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADREI2GR3WJD2ZUZVAH3PLLWDV4CXANCNFSM6AAAAAARHHR3LI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- You received this message because you are subscribed to the Google Groups "gcd-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to @.*** To view this discussion on the web visit https://groups.google.com/d/msgid/gcd-tech/GrandComicsDatabase/gcd-django/issues/562%40github.com https://groups.google.com/d/msgid/gcd-tech/GrandComicsDatabase/gcd-django/issues/562%40github.com?utm_medium=email&utm_source=footer.

jochengcd commented 1 year ago

This is actually a wrong use of ';' in the notes, should be just ','.

LegalizeAdulthood commented 1 year ago

This is actually a wrong use of ';' in the notes, should be just ','.

This is what I did when migrating that particular credit.

It also comes up in detailed description of panels/pages worked on by an artist, e.g. they are writing something like:

Steve Ditko (page 1, panel 4, 5; page 2, panel 3, 4; page 4, panel 6)

Sometimes it is clear that they intended two credits, e.g.:

? [as Crazy Gang; Bumble Brothers]