[Bug] Add regex to 'Link to Work' , 'Link to Creator Profile' and add numeric validation to 'Year of Creation'

soustab10 commented 1 year ago

Description

Add regex check in 'Link to Work' and 'Link to Creator Profile' field as it prevents users from entering random/incorrect URLs. Add check if year entered is a 4-digit number in the 'Year of Creation' field.

Reproduction

Step 1: Go to https://chooser-beta.creativecommons.org/
Step 2 : Fill the 'License Chooser' till Attribution Details'
Step 3 : Fill anything random in the URLs panel.
See that instead of giving a prompt/warning it accepts the data.

Expectation

The user should be prompted to enter a valid URL.

Screenshots

Environment

NA

Additional context

NA

Resolution

[x] I would be interested in resolving this bug.

possumbilities commented 1 year ago

@soustab10 this is an incredibly good find and look forward to your PR!

HoneyTyagii commented 1 year ago

@soustab10 Hey! I noticed that the issue #452 is still open. I'd love to help out and contribute to the project.

soustab10 commented 1 year ago

@HoneyTyagii I am working on it. Thank you.

Ankitsharma991 commented 1 year ago

@soustab10 can you add me (Sharmaa7861@gmail.com) to the slack channel, I am not able to join don't know why. I am interested in to contribute at CC through GSoC.

soustab10 commented 1 year ago

@soustab10 this is an incredibly good find and look forward to your PR!

@possumbilities I was thinking of adding a check to the "Year of Creation" field as well for users to enter only a valid year ( and not something like 12345). So, can you tell me what year range should I put in? Thank you.

possumbilities commented 1 year ago

@soustab10 Hmmm... This change is outside the scope of this Issue, which is presented as narrowly fixing validation only on the "link to work" field, and I'd ask that you make another Issue for that.

As to this specific ask: I don't think inputting a year range would be the best route. Validation should help, but start in a way that's not entirely restricting so it can be flexible and account for growth over time.

The only thing a year of creation would need to account for is that it's a 4 digit number. Going beyond that seems like adding too much complexity when this is a tool for creators to use, not a registration system where data is gathered and recorded.

soustab10 commented 1 year ago

ok then I will implement that. Thanks for helping me @possumbilities

possumbilities commented 1 year ago

@soustab10 Alternatively, you could expand this Issue to include the date input validation as well so that the changes in #456 are fully documented in an Issue somewhere.

Cronus1007 commented 1 year ago

@soustab10 Thanks for raising this issue. But I had a very great concern in this issue. @possumbilities Please have a look upon this since this issue has been discussed several times.

Form field validation is necessary before the data is sent to the server to prevent attacks such as SQL injection. Chooser does not send any information to the server, so it is not necessary. These values are only used to create the license code on the 'Mark Your Work' section. We have already discussed sanitizing user input in 177, so you can find more context there.

@soustab10 to elaborate, a single user is consuming their own input + output in one sitting, only on the client-side, so we don't need to validate the information at all. What they type is what they'll get. Sorry for pinging this thing so late. @possumbilities Would like to know more about your opinion?

possumbilities commented 1 year ago

@Cronus1007 Thanks for pulling up these past Issues and providing more context.

I would disagree that it's not necessary. Some form of validation is a requirement of good UX, the degree of detail that it involves would definitely be related to whether it's going to a server or not, but there should be some helpful shaping on what gets generated. The concern here is not malicious in nature as a priority, but instead whether the input could produce malformed output for the end-user or break the rendering of the page.

Generally speaking on the matter of security, XSS vulnerabilities rarely need input sent to a server, and largely rely on client side execution to accomplish them, so I'd say there is some level of marginal security concern on areas of focus like this. The chooser specifically shouldn't have this surface area of concern because of how its configured, but I wanted to point out to anyone reading this that just because something is input/output on the client doesn't necessarily mean it doesn't need validation from a security perspective and generally warrants a bit deeper consideration.

soustab10 commented 1 year ago

@Cronus1007 In the implementation of this issue, the user can still proceed with the process even if he enters an invalid URL as the Done button is not disabled. I think it is better to warn the user about the invalid URL and if required , change the warning message as well.

possumbilities commented 1 year ago

@soustab10 I would agree that if we are to do some level of UX validation shaping that it should perhaps impact the flow and warning message that the end-user interfaces with to align with that.

Other thoughts are that we could do some helpful nudging/shaping in a more subtle way.

Consider this scenario:

A user pastes in a URL minus the https/http at the beginning. Should the form check and prepend that at the beginning for them, or give them a warning it's missing?

I would lean towards checking if it's a link, and if it's missing the http or https, then the chooser prepends it, and avoids throwing an error at all. In this way there is a validation flow happening subtly under the hood that's shaping the output without prompting extra effort from the user. Yes, the placeholder text includes the https, but that doesn't necessarily mean users will follow that pattern.

Cronus1007 commented 1 year ago

@possumbilities Sure I got your point.

soustab10 commented 1 year ago

@soustab10 I would agree that if we are to do some level of UX validation shaping that it should perhaps impact the flow and warning message that the end-user interfaces with to align with that.

Other thoughts are that we could do some helpful nudging/shaping in a more subtle way.

Consider this scenario:

A user pastes in a URL minus the https/http at the beginning. Should the form check and prepend that at the beginning for them, or give them a warning it's missing?

I would lean towards checking if it's a link, and if it's missing the http or https, then the chooser prepends it, and avoids throwing an error at all. In this way there is a validation flow happening subtly under the hood that's shaping the output without prompting extra effort from the user. Yes, the placeholder text includes the https, but that doesn't necessarily mean users will follow that pattern.

I tested my PR code with the above scenario and in that case neither the user gets an error, nor is there a problem accessing the link from the final text. It seems browsers do it automatically.

Here's a screenshot:

possumbilities commented 1 year ago

@soustab10 I wouldn't rely on a browser to just correct a behavior for a non-standard URL or link, but it does seem there is some shaping happening at the moment to the generated html to help account for a leading http or not.

For example. If I type or paste in creativecommons.org into the field, the html generated is as follows:

<a property="dct:title" rel="cc:attributionURL" href="http://creativecommons.org">

It's adding the http:// on since I left it off, which is good.

If instead I type in any url that begins with http. For example: httpie.io I get:

<a property="dct:title" rel="cc:attributionURL" href="httpie.io">

That's not a valid URL. This is likely an outcome of the validation checking for http as a string alone, prepending the input.

It's certainly an edge case, but I think it illustrates the variability on shaping things subtley and to what degree, and what the trade-offs may be as a result.

soustab10 commented 1 year ago

@soustab10 I wouldn't rely on a browser to just correct a behavior for a non-standard URL or link, but it does seem there is some shaping happening at the moment to the generated html to help account for a leading http or not.

For example. If I type or paste in creativecommons.org into the field, the html generated is as follows:

<a property="dct:title" rel="cc:attributionURL" href="http://creativecommons.org">

It's adding the http:// on since I left it off, which is good.

If instead I type in any url that begins with http. For example: httpie.io I get:

<a property="dct:title" rel="cc:attributionURL" href="httpie.io">

That's not a valid URL. This is likely an outcome of the validation checking for http as a string alone, prepending the input.

It's certainly an edge case, but I think it illustrates the variability on shaping things subtley and to what degree, and what the trade-offs may be as a result.

Thank you for pointing this out. So should I work on this?

creativecommons / chooser