Open lucacasonato opened 4 years ago
3. Disallow any module names that have a levenshtein distance of less than 3 to any other existing module name, bad word, or reserved module name. (up for grabs)
Unless I'm misunderstanding I'm not sure how this can possibly work. E.g. eslint
and tslint
, or any two dictionary words that happen to be a letter apart https://listography.com/spamtastic/words/that_are_one_letter_apart let alone 2. What do npm
or cargo
do about this?
Unless I'm misunderstanding I'm not sure how this can possibly work. E.g. eslint and tslint, or any two dictionary words that happen to be a letter apart https://listography.com/spamtastic/words/that_are_one_letter_apart let alone 3.
I am not locked into the exact distance (if 1 gives desired results, we can do that). What we are trying to prevent is someone registering oak2
or oakk
or 0ak
. So that if you mistype or are not too familiar with Deno modules yet you do not accidentally install the wrong module (that might be malicious). I don't want someone to publish color
and someone else to publish colour
. Things like that are so confusing.
Yeah, this means that some module names are not available, but I think that cost is worth it.
What do
npm
orcargo
do about this?
AFAIK npm
does nothing about this (see https://www.npmjs.com/package/exxpress or https://www.npmjs.com/package/expres). For cargo
I do not know.
I don't think it's worth it. Especially with Deno where you're more likely to get the correct URL by copy-pasting it from somewhere, the mistyping problem should be especially rare and we can chalk the rest of it up to personal responsibility. There just isn't a nice rhyme or reason to what words are close together in distance -- weird names can rule out ubiquitous names just by being there first. And as I said it's far too usual for common words to be a letter apart.
There are better solutions. Have a dictionary for things like color
and colour
to make them specially mutually exclusive. Allow reputable modules to "claim" similar names (as they would buy similar domains). Use down-scoring based on name similarity to something well-known.
Perhaps this is a common issue in NPM where you mistype a letter and get the wrong module, but the URL system does request more attention from the user at the time of choosing a library.
Oak2 might be a completely valid name to submit in my opinion.
I can get started on the bad words filter 👌
Found a couple of lists that we could use for the comparison:
@lucacasonato what do you think?
@wperron Thanks! Any of those work, I'm sure...
Can't we just combine all three into one?
@wperron Do you think we should store with the source code, or as a table in the database that we check against?
I don't want to have the list just disappear from under us, so my plan was to copy the list into the project. Tbh, I don't know if creating a collection in Mongo just to store a couple of swear words is really worth it, plus putting it in the repository gives it a lot of visibility, we can link to the file in the README for example.
As for combining all three of them, yes of course we can 😛
plus putting it in the repository gives it a lot of visibility
We might not want that. Getting around it is a lot easier then :-). A database collection makes it a lot harder to find which words are included.
@lucacasonato do you have a list of reserved module names ready to go? I could include that check in #81 while I'm at it
@wperron Reserved module names are now handled as unlisted modules without uploaded versions. Easier because we can store them in the DB that way.
Hi!
I'm making a package name validation library. https://github.com/TomokiMiyauci/is-valid-package-name/tree/beta/deno_land
Deno seems to confirm the contents of badwords.txt
in S3 with the validation of the module name.
Is there a way for me to check the contents of badwords.txt?
Deno seems to confirm the contents of
badwords.txt
in S3 with the validation of the module name. Is there a way for me to check the contents of badwords.txt?
Not currently, the badwords.txt
is stored on a private s3 bucket with public access blocked https://github.com/denoland/deno_registry2/blob/main/terraform/main.tf#L110-L134
Yeah, I checked.
putting it in the repository gives it a lot of visibility, we can link to the file in the README for example.
Do you plan to release the file?
Not at the moment, see Luca's answer above
@wperron Thank you for answering
We should automatically moderate the names of modules people are uploading. I think we can start with these three steps (ordered by priority):
color
,queer
, orafrica
). (up for grabs)