dwmkerr / crosswords-js

Tiny, lightweight crossword control for the web.
https://dwmkerr.github.io/crosswords-js/
MIT License
71 stars 27 forks source link

Use data attribute instead of id for cell tagging #50

Closed mmkal closed 7 months ago

mmkal commented 8 months ago

Hi - opening this up for the sake of discussion initially - and hopefully eventually merging. I didn't spot this in #42 but I think it's worth addressing one way or another (even if it's not exactly this solution).

This change stops using id on cell elements. Instead it now uses a custom data-* attribute.

The main reason for this being that this isn't what an id is intended for - it's supposed to globally, uniquely, tag an HTML element. This could cause issues on pages that want to render multiple crosswords (something we are planning on - we want to have a page which renders several scrollable quick-fire crosswords stacked on top of each other so players can challenge themselves to see how fast they can scroll through all of them). It's a big no-no to have two elements with duplicate ids (although browsers will render them). Even if crosswords-js is careful to avoid the pitfalls of having non-unique ids, applications and testing libraries like playwright could still fall foul of them. I remember a discussion in one of the PRs about ids - I thought the upshot was they make sense in the sample app, but not as part of the library.

A second reason is that having ids that use special characters ((, , and )) are at odds with almost all ids I've seen in my past, and all the styleguides/naming conventions I could find from googling. So I think this change is better in terms of the principle of least astonishment.

Google's result for "html data attribute vs id" summarises this quite well:

image

The attribute looks like data-xy="3,4". I also went from (3,4) to 3,4 because:

Squares.io example:

image

crosswords-js example with this change:

image

Some other changes along the way:

Implementation note:

Happy to discuss alternatives - but I do think this is an important change for this library to get to a stable version (v1) - otherwise people will start relying on id and it'll be very hard to change.

mmkal commented 8 months ago

Oh @pvspain I meant to ask, do you remember what the problem was that led to 'id' is not used, but assignment silences chromium-dev-tools issue - and adding the input-${modelCell} id to the input elements? I should make sure this doesn't regress that problem (and see if we can remove the tag there without regressing it).

pvspain commented 8 months ago

Hi - opening this up for the sake of discussion initially - and hopefully eventually merging. I didn't spot this in #42 but I think it's worth addressing one way or another (even if it's not exactly this solution).

This change stops using id on cell elements. Instead it now uses a custom data-* attribute.

The main reason for this being that this isn't what an id is intended for - it's supposed to globally, uniquely, tag an HTML element.
... Even if crosswords-js is careful to avoid the pitfalls of having non-unique ids, applications and testing libraries like playwright could still fall foul of them. I remember a discussion in one of the PRs about ids - I thought the upshot was they make sense in the sample app, but not as part of the library.

In the earlier discussion I recall floating the idea of using a gridview-instance-unique prefix for its ids to address the multi-gridview webpage. Something along the lines of a namespace-ish tag (e.g. cwjs) with a singleton generator sequence value like this tacked onto it. That's just a strawman or talking point.

A second reason is that having ids that use special characters ((, , and )) are at odds with almost all ids I've seen in my past, and all the styleguides/naming conventions I could find from googling. So I think this change is better in terms of the principle of least astonishment.

Ids were introduced into the code to optimise the performance of the CellMap class.

That class is used by the grid element helpers in the controller, which are used heavily throughout the controller code. Originally, the (modelCell,cellElement) mappings were stored in an array, and retrieving a mapping in either direction was achieved by a linear scan of that array.

I changed the implementation so the cellElements became properties on a internal modelCells object within a CellMap, and we could leverage the JS object-property lookup, which is presumably a hash-like arrangement and has close to constant-time efficiency. This required the cellElement (the key - modelCells.{property-name}) to have an (unique) id value. I reasoned the best id for a cellElement in this context was the id for it's mapped modelCell, so mapping a cellElement to it's modelCell simplified to a property reference.

I had previously added .toString() methods to clue and modelCell, which work something like Python's dynamic attributes. This handy feature implicitly casts an Object to its string value when the context requires a string, such as in parameterised strings. I use this a lot in console messages. For the console message context, I chose the cell's grid coordinates, (x,y).

Following the KISS principle, I thought these methods would be a quick and cheap solution to the id requirement in the new CellMap.

The attribute looks like data-xy="3,4". I also went from (3,4) to 3,4 because:

  • that matches what squares.io does. Like it or not, speaking with setters and solvers alike, squares.io is used by many for general experimentation. I don't think going for 100% parity is necessary/ a good use of effort, but given I'm proposing a change already, may as well nudge in the direction of friendliness/compatibility.
  • It makes the HTML and code relying on it more readable. I have some playwright tests on my web app that do things like await page.click('[data-xy="3,4"]') - there is less confusing punctuation there than with await page.click('[data-xy="(3,4)"]')

The HTML-id context is quite different to the CellMap and console-message contexts, as you have elaborated. Your HTML-id context falls squarely under the scope of the HTML dataset property as you've described. I feel your notation is too verbose when you're scanning console messages spewing into a narrow column!

I don't think the different contexts need to use the same object property (id). As we have different contexts, perhaps neither context should use the unadorned id - to prevent ambiguity. On second thought, I guess you have no choice but to use the id property on a HTML element class. haven't looked into your implementation yet, so I'll shut up now and do that, as I may be wandering off on a tangent!

Happy to discuss alternatives - but I do think this is an important change for this library to get to a stable version (v1) - otherwise people will start relying on id and it'll be very hard to change.

Agreed.

pvspain commented 8 months ago

Oh @pvspain I meant to ask, do you remember what the problem was that led to 'id' is not used, but assignment silences chromium-dev-tools issue - and adding the input-${modelCell} id to the input elements? I should make sure this doesn't regress that problem (and see if we can remove the tag there without regressing it).

The first Issues tab entry in the screencap of the Chromium Dev Tools... Just trying to reduce the signal-to-noise ratio in reported problems! Screenshot 2023-10-26 22 27 05

mmkal commented 8 months ago

Thank you, useful context. It might be helpful to go over the use cases/concerns for this tag:

Performance in this library

I would be very surprised if this PR regressed performance within crosswords-js. The only changes are cellElement.id -> cellElement.dataset.xy. Both should be O(1) lookups. We're just looking up in a hashmap in CellMap as you say, so there should be no difference here.

Performance in userland (via document.querySelector)

If library users are relying on anything like document.getElementById('(3,4)'), they would have to switch to document.querySelector('[data-xy="3,4"]') which might be a little slower. But, from what I read online, the difference should be infinitesimal and the latter should still be executed at a rate in the order of millions of queries per second, so I don't think this will ever going to be a bottleneck, unless someone builds a crossword with literally thousands of rows and columns (an interesting use case actually!). Also there is nothing stopping an app user adding their own ids to the cell elements in a subclass of this constructor (I am doing something like this to get the data-xy attribute already). And I think that's appropriate - IMO ids should be set in apps, not libraries. So I don't think this should be a concern.

Debug logging

Yeah, let's make sure this doesn't get overly verbose. This is what they look like with this change (currentCell: 1,0 etc.), I think it's pretty compact + readable?

image

too verbose

Does the above cover your concern re verbosity, or did I misunderstand which logs you meant?

Chrome dev tools issues

Ah - thank you for the pointer! Just seen A form field element has neither an id nor a name attribute. I just pushed a commit 6cb7c9b8613885cd724ceffad857452d27c3c925 which addresses by setting the name attribute rather than id. I think that's more appropriate since there's no expectation for that to be unique. I confirmed that all the issues go away when running npm run dev.

mmkal commented 8 months ago

@dwmkerr bumping this - could we merge, if no objections?

dwmkerr commented 7 months ago

Looks good to me, merging now, thanks!