Charcoal-SE / metasmoke

Web dashboard for SmokeDetector.
https://metasmoke.erwaysoftware.com
Creative Commons Zero v1.0 Universal
43 stars 34 forks source link

Can't update duplicate domain #405

Closed tripleee closed 6 years ago

tripleee commented 6 years ago

There are duplicate entries for some domains. This is otherwise rather harmless, but it prevents updating a domain record when it has a duplicate.

Back when domain tags were introduced, I sometimes changed the domain name from www.domain to just domain. This is now biting me in the rear. Can these somehow be reverted in bulk?

For example, I cannot save edits to https://metasmoke.erwaysoftware.com/domains/10010 besthealthdiet.com because it clashes with https://metasmoke.erwaysoftware.com/domains/14424 (nominally www.besthealthdiet.com but I edited it to refer to besthealtdiet.com too).

I'll be happy to figure out which domains exactly need this, if there is some hope that they can all be fixed.

ArtOfCode- commented 6 years ago

cc'ing @Undo1 for console magic

Undo1 commented 6 years ago

I can definitely update them in bulk. Throw me a list of ids/strings and the desired transformation(s) and I'll get it done.

tripleee commented 6 years ago

Here: https://metasmoke.erwaysoftware.com/data/sql/queries/42-duplicate-domain-tags-405

tripleee commented 6 years ago

... Actually, looks like my query unearthed more than just the ones I had manually edited, and possibly not those at all, but still a useful diagnostic (currently 39 rows). At least besthealthdiet.com and drozien.com in those results look like they are an example of what I originally reported, but some of the other results are just duplicates (e.g. basij.um.ac.ir seems to have been extracted twice from a single post somehow).

tripleee commented 6 years ago

The symptoms are more complex than I thought, and there may be more than a single root cause.

There does not seem to be a need to rename things, generally speaking.

I think that where I had manually renamed the www.something domain to just something, this would then result in a new www.something record to be created. Perhaps this is also the reason it appears like some domain was extracted twice from some posts, but I can't completely confirm this; there may be a separate bug behind that.

In addition, it looks like some duplicate domain records were created simply because of a race condition.

The end result is that many records are redundant and could simply be removed by myself; but others are more complex, and need some manual merging.

I've tried to order this by order of complexity, hardest first, but I might not have a full understanding of what it takes to fix them.

There is a duplicate for www.timberdoorsmelbourne.net.au but it's because there are two post records for the same post; 95607 should be deleted as it duplicates 95606 and then the domain record 16386 can be zapped. (done - Art)


These were so trivial I could handle them myself. I'm listing them here for reference.

Undo1 commented 6 years ago

Manual console magic should be done - see edits to above comment.